CONTENTS

Chapter 10. Network Scripting

10.1 "Tune in, Log on, and Drop out"

In the last few years, the Internet has virtually exploded onto the mainstream stage. It has rapidly grown from a simple communication device used primarily by academics and researchers into a medium that is now nearly as pervasive as the television and telephone. Social observers have likened the Internet's cultural impact to that of the printing press, and technical observers have suggested that all new software development of interest occurs only on the Internet. Naturally, time will be the final arbiter for such claims, but there is little doubt that the Internet is a major force in society, and one of the main application contexts for modern software systems.

The Internet also happens to be one of the primary application domains for the Python programming language. Given Python and a computer with a socket-based Internet connection, we can write Python scripts to read and send email around the world, fetch web pages from remote sites, transfer files by FTP, program interactive web sites, parse HTML and XML files, and much more, simply by using the Internet modules that ship with Python as standard tools.

In fact, companies all over the world do: Yahoo, Infoseek, Hewlett-Packard, and many others rely on Python's standard tools to power their commercial web sites. Many also build and manage their sites with the Zope web application server, which is itself written and customizable in Python. Others use Python to script Java web applications with JPython (a.k.a. "Jython") -- a system that compiles Python programs to Java bytecode, and exports Java libraries for use in Python scripts.

As the Internet has grown, so too has Python's role as an Internet tool. Python has proven to be well-suited to Internet scripting for some of the very same reasons that make it ideal in other domains. Its modular design and rapid turnaround mix well with the intense demands of Internet development. In this part of the book, we'll find that Python does more than simply support Internet scripts; it also fosters qualities such as productivity and maintainability that are essential to Internet projects of all shapes and sizes.

10.1.1 Internet Scripting Topics

Internet programming entails many topics, so to make the presentation easier to digest, I've split this subject over the next six chapters of this book. This chapter introduces Internet fundamentals and explores sockets, the underlying communications mechanism of the Internet. From there, later chapters move on to discuss the client, the server, web site construction, and more advanced topics.

Each chapter assumes you've read the previous one, but you can generally skip around, especially if you have any experience in the Internet domain. Since these chapters represent a big portion (about a third) of this book at large, the following sections go into a few more details about what we'll be studying.

10.1.1.1 What we will cover

In conceptual terms, the Internet can roughly be thought of as being composed of multiple functional layers:

Low-level networking layers

Mechanisms such as the TCP/IP transport mechanism, which deal with transferring bytes between machines, but don't care what they mean

Sockets

The programmer's interface to the network, which runs on top of physical networking layers like TCP/IP

Higher-level protocols

Structured communication schemes such as FTP and email, which run on top of sockets and define message formats and standard addresses

Server-side web scripting (CGI)

Higher-level client/server communication protocols between web browsers and web servers, which also run on top of sockets

Higher-level frameworks and tools

Third-party systems such as Zope and JPython, which address much larger problem domains

In this chapter and Chapter 11, our main focus is on programming the second and third layers: sockets and higher-level protocols. We'll start this chapter at the bottom, learning about the socket model of network programming. Sockets aren't strictly tied to Internet scripting, but they are presented here because this is their primary role. As we'll see, most of what happens on the Internet happens through sockets, whether you notice or not.

After introducing sockets, the next chapter makes its way up to Python's client-side interfaces to higher-level protocols -- things like email and FTP transfers, which run on top of sockets. It turns out that a lot can be done with Python on the client alone, and Chapter 11 will sample the flavor of Python client-side scripting. The next three chapters then go on to present server-side scripting (programs that run on a server computer and are usually invoked by a web browser). Finally, the last chapter in this part, Chapter 15, briefly introduces even higher-level tools such as JPython and Zope.

Along the way, we will also put to work some of the operating-system and GUI interfaces we've studied earlier (e.g., processes, threads, signals, and Tkinter), and investigate some of the design choices and challenges that the Internet presents.

That last statement merits a few more words. Internet scripting, like GUIs, is one of the sexier application domains for Python. As in GUI work, there is an intangible but instant gratification in seeing a Python Internet program ship information all over the world. On the other hand, by its very nature, network programming imposes speed overheads and user interface limitations. Though it may not be a fashionable stance these days, some applications are still better off not being deployed on the Net. In this part of the book, we will take an honest look at the Net's trade-offs as they arise.

The Internet is also considered by many to be something of an ultimate proof of concept for open source tools. Indeed, much of the Net runs on top of a large number of tools, such as Python, Perl, the Apache web server, the sendmail program, and Linux. Moreover, new tools and technologies for programming the Web sometimes seem to appear faster than developers can absorb.

The good news is that Python's integration focus makes it a natural in such a heterogeneous world. Today, Python programs can be installed as client-side and server-side tools, embedded within HTML code, used as applets and servlets in Java applications, mixed into distributed object systems like CORBA and DCOM, integrated with XML-coded objects, and more. In more general terms, the rationale for using Python in the Internet domain is exactly the same as in any other: Python's emphasis on productivity, portability, and integration make it ideal for writing Internet programs that are open, maintainable, and delivered according to the ever-shrinking schedules in this field.

10.1.1.2 What we won't cover

Now that I've told you what we will cover in this book, I should also mention what we won't cover. Like Tkinter, the Internet is a large domain, and this part of the book is mostly an introduction to core concepts and representative tasks, not an exhaustive reference. There are simply too many Python Internet modules to include each in this text, but the examples here should help you understand the library manual entries for modules we don't have time to cover.

I also want to point out that higher-level tools like JPython and Zope are large systems in their own right, and they are best dealt with in more dedicated documents. Because books on both topics are likely to appear soon, we'll merely scratch their surfaces here. Moreover, this book says almost nothing about lower-level networking layers such as TCP/IP. If you're curious about what happens on the Internet at the bit-and-wire level, consult a good networking text for more details.

10.1.1.3 Running examples in this part of the book

Internet scripts generally imply execution contexts that earlier examples in this book have not. That is, it usually takes a bit more to run programs that talk over networks. Here are a few pragmatic notes about this part's examples up front:

When a Python script opens an Internet connection (with the socket or protocol modules), Python will happily use whatever Internet link exists on your machine, be that a dedicated T1 line, a DSL line, or a simple modem. For instance, opening a socket on a Windows PC automatically initiates processing to create a dial-up connection to your Internet Service Provider if needed (on my laptop, a Windows modem connection dialog automatically pops up). In other words, if you have a way to connect to the Net, you likely can run programs in this chapter.

Moreover, as long as your machine supports sockets, you probably can run many of the examples here even if you have no Internet connection at all. As we'll see, a machine name "localhost" or "" usually means the local computer itself. This allows you to test both the client and server sides of a dialog on the same computer without connecting to the Net. For example, you can run both socket-based clients and servers locally on a Windows PC without ever going out to the Net.

Some later examples assume that a particular kind of server is running on a server machine (e.g., FTP, POP, SMTP), but client-side scripts work on any Internet-aware machine with Python installed. Server-side examples in Chapter 12, Chapter 13, and Chapter 14 require more: you'll need a web server account to code CGI scripts, and you must download advanced third-party systems like JPython and Zope separately (or find them by viewing http://examples.oreilly.com/python2).

In the Beginning There Was Grail

Besides creating the Python language, Guido van Rossum also wrote a World Wide Web browser in Python, named (appropriately enough) Grail. Grail was partly developed as a demonstration of Python's capabilities. It allows users to browse the Web much like Netscape or Internet Explorer, but can also be programmed with Grail applets -- Python/Tkinter programs downloaded from a server when accessed and run on the client by the browser. Grail applets work much like Java applets in more widespread browsers (more on applets in Chapter 15).

Grail is no longer under development and is mostly used for research purposes today. But Python still reaps the benefits of the Grail project, in the form of a rich set of Internet tools. To write a full-featured web browser, you need to support a wide variety of Internet protocols, and Guido packaged support for all of these as standard library modules that are now shipped with the Python language.

Because of this legacy, Python now includes standard support for Usenet news (NNTP), email processing (POP, SMTP, IMAP), file transfers (FTP), web pages and interactions (HTTP, URLs, HTML, CGI), and other commonly used protocols (Telnet, Gopher, etc.). Python scripts can connect to all of these Internet components by simply importing the associated library module.

Since Grail, additional tools have been added to Python's library for parsing XML files, OpenSSL secure sockets, and more. But much of Python's Internet support can be traced back to the Grail browser -- another example of Python's support for code reuse at work. At this writing, you can still find the Grail at http://www.python.org.

10.2 Plumbing the Internet

Unless you've been living in a cave for the last decade, you are probably already familiar with what the Internet is about, at least from a user's perspective. Functionally, we use it as a communication and information medium, by exchanging email, browsing web pages, transferring files, and so on. Technically, the Internet consists of many layers of abstraction and device -- from the actual wires used to send bits across the world to the web browser that grabs and renders those bits into text, graphics, and audio on your computer.

In this book, we are primarily concerned with the programmer's interface to the Internet. This too consists of multiple layers: sockets, which are programmable interfaces to the low-level connections between machines, and standard protocols, which add structure to discussions carried out over sockets. Let's briefly look at each of these layers in the abstract before jumping into programming details.

10.2.1 The Socket Layer

In simple terms, sockets are a programmable interface to network connections between computers. They also form the basis, and low-level "plumbing," of the Internet itself: all of the familiar higher-level Net protocols like FTP, web pages, and email, ultimately occur over sockets. Sockets are also sometimes called communications endpoints because they are the portals through which programs send and receive bytes during a conversation.

To programmers, sockets take the form of a handful of calls available in a library. These socket calls know how to send bytes between machines, using lower-level operations such as the TCP network transmission control protocol. At the bottom, TCP knows how to transfer bytes, but doesn't care what those bytes mean. For the purposes of this text, we will generally ignore how bytes sent to sockets are physically transferred. To understand sockets fully, though, we need to know a bit about how computers are named.

10.2.1.1 Machine identifiers

Suppose for just a moment that you wish to have a telephone conversation with someone halfway across the world. In the real world, you would probably either need that person's telephone number, or a directory that can be used to look up the number from his or her name (e.g., a telephone book). The same is true on the Internet: before a script can have a conversation with another computer somewhere in cyberspace, it must first know that other computer's number or name.

Luckily, the Internet defines standard ways to name both a remote machine, and a service provided by that machine. Within a script, the computer program to be contacted through a socket is identified by supplying a pair of values -- the machine name, and a specific port number on that machine:

Machine names

A machine name may take the form of either a string of numbers separated by dots called an IP address (e.g., 166.93.218.100), or a more legible form known as a domain name (e.g., starship.python.net). Domain names are automatically mapped into their dotted numeric address equivalent when used, by something called a domain name server -- a program on the Net that serves the same purpose as your local telephone directory assistance service.

Port numbers

A port number is simply an agreed-upon numeric identifier for a given conversation. Because computers on the Net can support a variety of services, port numbers are used to name a particular conversation on a given machine. For two machines to talk over the Net, both must associate sockets with the same machine name and port number when initiating network connections.

The combination of a machine name and a port number uniquely identifies every dialog on the Net. For instance, an Internet Service Provider's computer may provide many kinds of services for customers -- web pages, Telnet, FTP transfers, email, and so on. Each service on the machine is assigned a unique port number to which requests may be sent. To get web pages from a web server, programs need to specify both the web server's IP or domain name, and the port number on which the server listens for web page requests.

If this all sounds a bit strange, it may help to think of it in old-fashioned terms. In order to have a telephone conversation with someone within a company, for example, you usually need to dial both the company's phone number, as well as the extension of the person you want to reach. Moreover, if you don't know the company's number, you can probably find it by looking up the company's name in a phone book. It's almost the same on the Net -- machine names identify a collection of services (like a company), port numbers identify an individual service within a particular machine (like an extension), and domain names are mapped to IP numbers by domain name servers (like a phone book).

When programs use sockets to communicate in specialized ways with another machine (or with other processes on the same machine), they need to avoid using a port number reserved by a standard protocol -- numbers in the range of 0-1023 -- but we first need to discuss protocols to understand why.

10.2.2 The Protocol Layer

Although sockets form the backbone of the Internet, much of the activity that happens on the Net is programmed with protocols,[1] which are higher-level message models that run on top of sockets. In short, Internet protocols define a structured way to talk over sockets. They generally standardize both message formats and socket port numbers:

Raw sockets are still commonly used in many systems, but it is perhaps more common (and generally easier) to communicate with one of the standard higher-level Internet protocols.

10.2.2.1 Port number rules

Technically speaking, socket port numbers can be any 16-bit integer value between and 65,535. However, to make it easier for programs to locate the standard protocols, port numbers in the range of 0-1023 are reserved and preassigned to the standard higher-level protocols. Table 10-1 lists the ports reserved for many of the standard protocols; each gets one or more preassigned numbers from the reserved range.

Table 10-1. Port Numbers Reserved for Common Protocols

Protocol

Common Function

Port Number

Python Module

HTTP

Web pages

80

httplib

NNTP

Usenet news

119

nntplib

FTP data default

File transfers

20

ftplib

FTP control

File transfers

21

ftplib

SMTP

Sending email

25

smtplib

POP3

Fetching email

110

poplib

IMAP4

Fetching email

143

imaplib

Finger

Informational

79

n/a

Telnet

Command lines

23

telnetlib

Gopher

Document transfers

70

gopherlib

10.2.2.2 Clients and servers

To socket programmers, the standard protocols mean that port numbers 0-1023 are off-limits to scripts, unless they really mean to use one of the higher-level protocols. This is both by standard and by common sense. A Telnet program, for instance, can start a dialog with any Telnet-capable machine by connecting to its port 23; without preassigned port numbers, each server might install Telnet on a different port. Similarly, web sites listen for page requests from browsers on port 80 by standard; if they did not, you might have to know and type the HTTP port number of every site you visit while surfing the Net.

By defining standard port numbers for services, the Net naturally gives rise to a client/server architecture. On one side of a conversation, machines that support standard protocols run a set of perpetually running programs that listen for connection requests on the reserved ports. On the other end of a dialog, other machines contact those programs to use the services they export.

We usually call the perpetually running listener program a server and the connecting program a client. Let's use the familiar web browsing model as an example. As shown in Table 10-1, the HTTP protocol used by the Web allows clients and servers to talk over sockets on port 80:

Server

A machine that hosts web sites usually runs a web server program that constantly listens for incoming connection requests, on a socket bound to port 80. Often, the server itself does nothing but watch for requests on its port perpetually; handling requests is delegated to spawned processes or threads.

Clients

Programs that wish to talk to this server specify the server machine's name and port 80 to initiate a connection. For web servers, typical clients are web browsers like Internet Explorer or Netscape, but any script can open a client-side connection on port 80 to fetch web pages from the server.

In general, many clients may connect to a server over sockets, whether it implements a standard protocol or something more specific to a given application. And in some applications, the notion of client and server is blurred -- programs can also pass bytes between each other more as peers than as master and subordinate. For the purpose of this book, though, we usually call programs that listen on sockets servers, and those that connect, clients. We also sometimes call the machines that these programs run on server and client (e.g., a computer on which a web server program runs may be called a web server machine, too), but this has more to do with the physical than the functional.

10.2.2.3 Protocol structures

Functionally, protocols may accomplish a familiar task like reading email or posting a Usenet newsgroup message, but they ultimately consist of message bytes sent over sockets. The structure of those message bytes varies from protocol to protocol, is hidden by the Python library, and is mostly beyond the scope of this book, but a few general words may help demystify the protocol layer.

Some protocols may define the contents of messages sent over sockets; others may specify the sequence of control messages exchanged during conversations. By defining regular patterns of communication, protocols make communication more robust. They can also minimize deadlock conditions -- machines waiting for messages that never arrive.

For example, the FTP protocol prevents deadlock by conversing over two sockets: one for control messages only, and one to transfer file data. An FTP server listens for control messages (e.g., "send me a file") on one port, and transfers file data over another. FTP clients open socket connections to the server machine's control port, send requests, and send or receive file data over a socket connected to a data port on the server machine. FTP also defines standard message structures passed between client and server. The control message used to request a file, for instance, must follow a standard format.

10.2.3 Python's Internet Library Modules

If all of this sounds horribly complex, cheer up: Python's standard protocol modules handle all the details. For example, the Python library's ftplib module manages all the socket and message-level handshaking implied by the FTP protocol. Scripts that import ftplib have access to a much higher-level interface for FTPing files and can be largely ignorant of both the underlying FTP protocol, and the sockets over which it runs.[2]

In fact, each supported protocol is represented by a standard Python module file with a name of the form xxxlib.py, where xxx is replaced by the protocol's name, in lowercase. The last column in Table 10-1 gives the module name for protocol standard modules. For instance, FTP is supported by module file ftplib.py. Moreover, within the protocol modules, the top-level interface object is usually the name of the protocol. So, for instance, to start an FTP session in a Python script, you run import ftplib and pass appropriate parameters in a call to ftplib.FTP(); for Telnet, create a telnetlib.Telnet().

In addition to the protocol implementation modules in Table 10-1, Python's standard library also contains modules for parsing and handling data once it has been transferred over sockets or protocols. Table 10-2 lists some of the more commonly used modules in this category.

Table 10-2. Common Internet-Related Standard Modules

Python Modules

Utility

socket

Low-level network communications support (TCP/IP, UDP, etc.).

cgi

Server-side CGI script support: parse input stream, escape HTML text, etc.

urllib

Fetch web pages from their addresses (URLs), escape URL text

httplib, ftplib, nntplib

HTTP (web), FTP (file transfer), and NNTP (news) protocol modules

poplib, imaplib, smtplib

POP, IMAP (mail fetch), and SMTP (mail send) protocol modules

telnetlib, gopherlib

Telnet and Gopher protocol modules

htmllib, sgmllib, xmllib

Parse web page contents (HTML, SGML, and XML documents)

xdrlib

Encode binary data portably (also see the struct and socket modules)

rfc822

Parse email-style header lines

mhlib, mailbox

Process complex mail messages and mailboxes

mimetools, mimify

Handle MIME-style message bodies

multifile

Read messages with multiple parts

uu, binhex, base64, binascii, quopri

Encode and decode binary (or other) data transmitted as text

urlparse

Parse URL string into components

SocketServer

Framework for general net servers

BaseHTTPServer

Basic HTTP server implementation

SimpleHTTPServer, CGIHTTPServer

Specific HTTP web server request handler modules

rexec, bastion

Restricted code execution modes

We will meet many of this table's modules in the next few chapters of this book, but not all. The modules demonstrated are representative, but as always, be sure to see Python's standard Library Reference Manual for more complete and up-to-date lists and details.

More on Protocol Standards

If you want the full story on protocols and ports, at this writing you can find a comprehensive list of all ports reserved for protocols, or registered as used by various common systems, by searching the web pages maintained by the Internet Engineering Task Force (IETF) and the Internet Assigned Numbers Authority (IANA). The IETF is the organization responsible for maintaining web protocols and standards. The IANA is the central coordinator for the assignment of unique parameter values for Internet protocols. Another standards body, the W3 (for WWW), also maintains relevant documents. See these web pages for more details:

http://www.ietf.org
http://www.iana.org/numbers.html
http://www.iana.org/assignments/port-numbers
http://www.w3.org

It's not impossible that more recent repositories for standard protocol specifications will arise during this book's shelf-life, but the IETF web site will likely be the main authority for some time to come. If you do look, though, be warned that the details are, well, detailed. Because Python's protocol modules hide most of the socket and messaging complexity documented in the protocol standards, you usually don't need to memorize these documents to get web work done in Python.

10.3 Socket Programming

Now that we've seen how sockets figure into the Internet picture, let's move on to explore the tools that Python provides for programming sockets with Python scripts. This section shows you how to use the Python socket interface to perform low-level network communications; in later chapters, we will instead use one of the higher-level protocol modules that hide underlying sockets.

The basic socket interface in Python is the standard library's socket module. Like the os POSIX module, Python's socket module is just a thin wrapper (interface layer) over the underlying C library's socket calls. Like Python files, it's also object-based: methods of a socket object implemented by this module call out to the corresponding C library's operations after data conversions. The socket module also includes tools for converting bytes to a standard network ordering, wrapping socket objects in simple file objects, and more. It supports socket programming on any machine that supports BSD-style sockets -- MS Windows, Linux, Unix, etc. -- and so provides a portable socket interface.

10.3.1 Socket Basics

To create a connection between machines, Python programs import the socket module, create a socket object, and call the object's methods to establish connections and send and receive data. Socket object methods map directly to socket calls in the C library. For example, the script in Example 10-1 implements a program that simply listens for a connection on a socket, and echoes back over a socket whatever it receives through that socket, adding 'Echo=>' string prefixes.

Example 10-1. PP2E\Internet\Sockets\echo-server.py
#########################################################
# Server side: open a socket on a port, listen for
# a message from a client, and send an echo reply; 
# this is a simple one-shot listen/reply per client, 
# but it goes into an infinite loop to listen for 
# more clients as long as this server script runs; 
#########################################################

from socket import *                    # get socket constructor and constants
myHost = ''                             # server machine, '' means local host
myPort = 50007                          # listen on a non-reserved port number

sockobj = socket(AF_INET, SOCK_STREAM)       # make a TCP socket object
sockobj.bind((myHost, myPort))               # bind it to server port number 
sockobj.listen(5)                            # listen, allow 5 pending connects

while 1:                                     # listen until process killed
    connection, address = sockobj.accept()   # wait for next client connect
    print 'Server connected by', address     # connection is a new socket
    while 1:
        data = connection.recv(1024)         # read next line on client socket
        if not data: break                   # send a reply line to the client
        connection.send('Echo=>' + data)     # until eof when socket closed
    connection.close() 

As mentioned earlier, we usually call programs like this that listen for incoming connections servers because they provide a service that can be accessed at a given machine and port on the Internet. Programs that connect to such a server to access its service are generally called clients. Example 10-2 shows a simple client implemented in Python.

Example 10-2. PP2E\Internet\Sockets\echo-client.py
#############################################################
# Client side: use sockets to send data to the server, and 
# print server's reply to each message line; 'localhost' 
# means that the server is running on the same machine as 
# the client, which lets us test client and server on one 
# machine;  to test over the Internet, run a server on a remote 
# machine, and set serverHost or argv[1] to machine's domain
# name or IP addr;  Python sockets are a portable BSD socket
# interface, with object methods for standard socket calls;
#############################################################

import sys
from socket import *              # portable socket interface plus constants
serverHost = 'localhost'          # server name, or: 'starship.python.net'
serverPort = 50007                # non-reserved port used by the server

message = ['Hello network world']           # default text to send to server
if len(sys.argv) > 1:
    serverHost = sys.argv[1]                # or server from cmd line arg 1
    if len(sys.argv) > 2:                   # or text from cmd line args 2..n
        message = sys.argv[2:]              # one message for each arg listed

sockobj = socket(AF_INET, SOCK_STREAM)      # make a TCP/IP socket object
sockobj.connect((serverHost, serverPort))   # connect to server machine and port

for line in message:
    sockobj.send(line)                      # send line to server over socket
    data = sockobj.recv(1024)               # receive line from server: up to 1k
    print 'Client received:', `data`

sockobj.close()                             # close socket to send eof to server
10.3.1.1 Server socket calls

Before we see these programs in action, let's take a minute to explain how this client and server do their stuff. Both are fairly simple examples of socket scripts, but they illustrate common call patterns of most socket-based programs. In fact, this is boilerplate code: most socket programs generally make the same socket calls that our two scripts do, so let's step through the important points of these scripts line by line.

Programs such as Example 10-1 that provide services for other programs with sockets generally start out by following this sequence of calls:

sockobj = socket(AF_INET, SOCK_STREAM)

Uses the Python socket module to create a TCP socket object. The names AF_INET and SOCK_STREAM are preassigned variables defined by and imported form the socket module; using them in combination means "create a TCP/IP socket," the standard communication device for the Internet. More specifically, AF_INET means the IP address protocol, and SOCK_STREAM means the TCP transfer protocol.

If you use other names in this call, you can instead create things like UDP connectionless sockets (use SOCK_DGRAM second) and Unix domain sockets on the local machine (use AF_UNIX first), but we won't do so in this book. See the Python library manual for details on these and other socket module options.

sockobj.bind((myHost, myPort))

Associates the socket object to an address -- for IP addresses, we pass a server machine name and port number on that machine. This is where the server identifies the machine and port associated with the socket. In server programs, the hostname is typically an empty string (""), which means the machine that the script runs on and the port is a number outside the range 0-1023 (which is reserved for standard protocols, described earlier). Note that each unique socket dialog you support must have its own port number; if you try to open a socket on a port already in use, Python will raise an exception. Also notice the nested parenthesis in this call -- for the AF_INET address protocol socket here, we pass the host/port socket address to bind as a two-item tuple object (pass a string for AF_UNIX). Technically, bind takes a tuple of values appropriate for the type of socket created (but see the next Note box about the older and deprecated convention of passing values to this function as distinct arguments).

sockobj.listen(5)

Starts listening for incoming client connections and allows for a backlog of up to five pending requests. The value passed sets the number of incoming client requests queued by the operating system before new requests are denied (which only happens if a server isn't fast enough to process requests before the queues fill up). A value of 5 is usually enough for most socket-based programs; the value must be at least 1.

At this point, the server is ready to accept connection requests from client programs running on remote machines (or the same machine), and falls into an infinite loop waiting for them to arrive:

connection, address = sockobj.accept()

Waits for the next client connection request to occur; when it does, the accept call returns a brand new socket object over which data can be transferred from and to the connected client. Connections are accepted on sockobj, but communication with a client happens on connection, the new socket. This call actually returns a two-item tuple -- address is the connecting client's Internet address. We can call accept more than one time, to service multiple client connections; that's why each call returns a new, distinct socket for talking to a particular client.

Once we have a client connection, we fall into another loop to receive data from the client in blocks of 1024 bytes at a time, and echo each block back to the client:

data = connection.recv(1024)

Reads at most 1024 more bytes of the next message sent from a client (i.e., coming across the network), and returns it to the script as a string. We get back an empty string when the client has finished -- end-of-file is triggered when the client closes its end of the socket.

connection.send('Echo=>' + data)

Sends the latest data block back to the client program, prepending the string 'Echo=>' to it first. The client program can then recv what we send here -- the next reply line.

connection.close()

Shuts down the connection with this particular client.

After talking with a given client, the server goes back to its infinite loop, and waits for the next client connection request.

10.3.1.2 Client socket calls

On the other hand, client programs like the one shown in Example 10-2 follow simpler call sequences. The main thing to keep in mind is that the client and server must specify the same port number when opening their sockets, and the client must identify the machine on which the server is running (in our scripts, server and client agree to use port number 50007 for their conversation, outside the standard protocol range):

sockobj = socket(AF_INET, SOCK_STREAM)

Creates a Python socket object in the client program, just like the server.

sockobj.connect((serverHost, serverPort))

Opens a connection to the machine and port on which the server program is listening for client connections. This is where the client specifies the name of the service to be contacted. In the client, we can either specify the name of the remote machine as a domain name (e.g., starship.python.net) or numeric IP address. We can also give the server name as localhost to specify that the server program is running on the same machine as the client; that comes in handy for debugging servers without having to connect to the Net. And again, the client's port number must match the server's exactly. Note the nested parentheses again -- just as in server bind calls, we really pass the server's host/port address to connect in a tuple object.

Once the client establishes a connection to the server, it falls into a loop sending a message one line at a time and printing whatever the server sends back after each line is sent:

sockobj.send(line)

Transfers the next message line to the server over the socket.

data = sockobj.recv(1024)

Reads the next reply line sent by the server program. Technically, this reads up to 1024 bytes of the next reply message and returns it as a string.

sockobj.close()

Closes the connection with the server, sending it the end-of-file signal.

And that's it. The server exchanges one or more lines of text with each client that connects. The operating system takes care of locating remote machines, routing bytes sent between programs across the Internet, and (with TCP) making sure that our messages arrive intact. That involves a lot of processing, too -- our strings may ultimately travel around the world, crossing phone wires, satellite links, and more along the way. But we can be happily ignorant of what goes on beneath the socket call layer when programming in Python.

In older Python code, you may see the AF_INET server address passed to the server-side bind and client-side connect socket methods as two distinct arguments, instead of a two-item tuple:

soc.bind(host,port)     vs soc.bind((host,port))
soc.connect(host,port)  vs soc.connect((host,port))

This two-argument form is now deprecated, and only worked at all due to a shortcoming in earlier Python releases (unfortunately, the Python library manual's socket example used the two-argument form too!). The tuple server address form is preferred, and, in a rare Python break with full backward-compatibility, will likely be the only one that will work in future Python releases.

10.3.1.3 Running socket programs locally

Okay, let's put this client and server to work. There are two ways to run these scripts -- either on the same machine or on two different machines. To run the client and the server on the same machine, bring up two command-line consoles on your computer, start the server program in one, and run the client repeatedly in the other. The server keeps running and responds to requests made each time you run the client script in the other window.

For instance, here is the text that shows up in the MS-DOS console window where I've started the server script:

C:\...\PP2E\Internet\Sockets>python echo-server.py
Server connected by ('127.0.0.1', 1025)
Server connected by ('127.0.0.1', 1026)
Server connected by ('127.0.0.1', 1027)

The output here gives the address (machine IP name and port number) of each connecting client. Like most servers, this one runs perpetually, listening for client connection requests. This one receives three, but I have to show you the client window's text for you to understand what this means:

C:\...\PP2E\Internet\Sockets>python echo-client.py
Client received: 'Echo=>Hello network world'

C:\...\PP2E\Internet\Sockets>python echo-client.py localhost spam Spam SPAM
Client received: 'Echo=>spam'
Client received: 'Echo=>Spam'
Client received: 'Echo=>SPAM'

C:\...\PP2E\Internet\Sockets>python echo-client.py localhost Shrubbery
Client received: 'Echo=>Shrubbery'

Here, I ran the client script three times, while the server script kept running in the other window. Each client connected to the server, sent it a message of one or more lines of text, and read back the server's reply -- an echo of each line of text sent from the client. And each time a client is run, a new connection message shows up in the server's window (that's why we got three).

It's important to notice that clients and server are running on the same machine here (a Windows PC). The server and client agree on port number, but use machine names "" and "localhost" respectively, to refer to the computer that they are running on. In fact, there is no Internet connection to speak of. Sockets also work well as cross-program communications tools on a single machine.

10.3.1.4 Running socket programs remotely

To make these scripts talk over the Internet instead of on a single machine, we have to do some extra work to run the server on a different computer. First, upload the server's source file to a remote machine where you have an account and a Python. Here's how I do it with FTP; your server name and upload interface details may vary, and there are other ways to copy files to a computer (e.g., email, web-page post forms, etc.):[3]

C:\...\PP2E\Internet\Sockets>ftp starship.python.net
Connected to starship.python.net.
User (starship.python.net:(none)): lutz
331 Password required for lutz.
Password:
230 User lutz logged in.
ftp> put echo-server.py
200 PORT command successful.
150 Opening ASCII mode data connection for echo-server.py.
226 Transfer complete.
ftp: 1322 bytes sent in 0.06Seconds 22.03Kbytes/sec.
ftp> quit

Once you have the server program loaded on the other computer, you need to run it there. Connect to that computer and start the server program. I usually telnet into my server machine and start the server program as a perpetually running process from the command line.[4] The & syntax in Unix/Linux shells can be used to run the server script in the background; we could also make the server directly executable with a #! line and a chmod command (see Chapter 2, for details). Here is the text that shows up in a Window on my PC that is running a Telnet session connected to the Linux server where I have an account (less a few deleted informational lines):

C:\...\PP2E\Internet\Sockets>telnet starship.python.net
Red Hat Linux release 6.2 (Zoot)
Kernel 2.2.14-5.0smp on a 2-processor i686
login: lutz
Password:
[lutz@starship lutz]$ python echo-server.py &
[1] 4098

Now that the server is listening for connections on the Net, run the client on your local computer multiple times again. This time, the client runs on a different machine than the server, so we pass in the server's domain or IP name as a client command-line argument. The server still uses a machine name of "" because it always listens on whatever machine it runs upon. Here is what shows up in the server's Telnet window:

[lutz@starship lutz]$ Server connected by ('166.93.68.61', 1037)
Server connected by ('166.93.68.61', 1040)
Server connected by ('166.93.68.61', 1043)
Server connected by ('166.93.68.61', 1050)

And here is what appears in the MS-DOS console box where I run the client. A "connected by" message appears in the server Telnet window each time the client script is run in the client window:

C:\...\PP2E\Internet\Sockets>python echo-client.py starship.python.net
Client received: 'Echo=>Hello network world'

C:\...\PP2E\Internet\Sockets>python echo-client.py starship.python.net ni Ni NI
Client received: 'Echo=>ni'
Client received: 'Echo=>Ni'
Client received: 'Echo=>NI'

C:\...\PP2E\Internet\Sockets>python echo-client.py starship.python.net Shrubbery
Client received: 'Echo=>Shrubbery'

C:\...\PP2E\Internet\Sockets>ping starship.python.net
Pinging starship.python.net [208.185.174.112] with 32 bytes of data:
Reply from 208.185.174.112: bytes=32 time=311ms TTL=246
ctrl-C
C:\...\PP2E\Internet\Sockets>python echo-client.py 208.185.174.112 Does she?
Client received: 'Echo=>Does'
Client received: 'Echo=>she?'

The "ping" command can be used to get an IP address for a machine's domain name; either machine name form can be used to connect in the client. This output is perhaps a bit understated -- a lot is happening under the hood. The client, running on my Windows laptop, connects with and talks to the server program running on a Linux machine perhaps thousands of miles away. It all happens about as fast as when client and server both run on the laptop, and it uses the same library calls; only the server name passed to clients differs.

10.3.1.5 Socket pragmatics

Before we move on, there are three practical usage details you should know. First of all, you can run the client and server like this on any two Internet-aware machines where Python is installed. Of course, to run clients and server on different computers, you need both a live Internet connection and access to another machine on which to run the server. You don't need a big, expensive Internet link, though -- a simple modem and dialup Internet account will do for clients. When sockets are opened, Python is happy to use whatever connectivity you have, be it a dedicated T1 line, or a dialup modem account.

On my laptop PC, for instance, Windows automatically dials out to my ISP when clients are started or when Telnet server sessions are opened. In this book's examples, server-side programs that run remotely are executed on a machine called starship.python.net. If you don't have an account of your own on such a server, simply run client and server examples on the same machine, as shown earlier; all you need then is a computer that allows sockets, and most do.

Secondly, the socket module generally raises exceptions if you ask for something invalid. For instance, trying to connect to a nonexistent server (or unreachable servers, if you have no Internet link) fails:

C:\...\PP2E\Internet\Sockets>python echo-client.py www.nonesuch.com hello
Traceback (innermost last):
  File "echo-client.py", line 24, in ?
    sockobj.connect((serverHost, serverPort))   # connect to server machine... 
  File "<string>", line 1, in connect
socket.error: (10061, 'winsock error')

Finally, also be sure to kill the server process before restarting it again, or else the port number will be still in use, and you'll get another exception:

[lutz@starship uploads]$ ps -x
  PID TTY      STAT   TIME COMMAND
 5570 pts/0    S      0:00 -bash
 5570 pts/0    S      0:00 -bash
 5633 pts/0    S      0:00 python echo-server.py
 5634 pts/0    R      0:00 ps -x
[lutz@starship uploads]$ python echo-server.py
Traceback (most recent call last):
  File "echo-server.py", line 14, in ?
    sockobj.bind((myHost, myPort))               # bind it to server port number
socket.error: (98, 'Address already in use')

Under Python 1.5.2, a series of Ctrl-C's will kill the server on Linux (be sure to type fg to bring it to the foreground first if started with an &):

[lutz@starship uploads]$ python echo-server.py
ctrl-c
Traceback (most recent call last):
  File "echo-server.py", line 18, in ?
    connection, address = sockobj.accept()   # wait for next client connect
KeyboardInterrupt

A Ctrl-C kill key combination won't kill the server on my Windows machine, however. To kill the perpetually running server process running locally on Windows, you may need to type a Ctrl-Alt-Delete key combination, and then end the Python task by selecting it in the process listbox that appears. You can usually also kill a server on Linux with a kill -9 pid shell command if it is running in another window or in the background, but Ctrl-C is less typing.

10.3.1.6 Spawning clients in parallel

To see how the server handles the load, let's fire up eight copies of the client script in parallel using the script in Example 10-3 (see the end of Chapter 3, for details on the launchmodes module used here to spawn clients).

Example 10-3. PP2E\Internet\Sockets\testecho.py
import sys, string
from PP2E.launchmodes import QuietPortableLauncher

numclients = 8
def start(cmdline): QuietPortableLauncher(cmdline, cmdline)()

# start('echo-server.py')              # spawn server locally if not yet started

args = string.join(sys.argv[1:], ' ')  # pass server name if running remotely
for i in range(numclients):
    start('echo-client.py %s' % args)  # spawn 8? clients to test the server

To run this script, pass no arguments to talk to a server listening on port 50007 on the local machine; pass a real machine name to talk to a server running remotely. On Windows, the clients' output is discarded when spawned from this script:

C:\...\PP2E\Internet\Sockets>python testecho.py

C:\...\PP2E\Internet\Sockets>python testecho.py starship.python.net

If the spawned clients connect to a server run locally, connection messages show up in the server's window on the local machine:

C:\...\PP2E\Internet\Sockets>python echo-server.py
Server connected by ('127.0.0.1', 1283)
Server connected by ('127.0.0.1', 1284)
Server connected by ('127.0.0.1', 1285)
Server connected by ('127.0.0.1', 1286)
Server connected by ('127.0.0.1', 1287)
Server connected by ('127.0.0.1', 1288)
Server connected by ('127.0.0.1', 1289)
Server connected by ('127.0.0.1', 1290)

If the server is running remotely, the client connection messages instead appear in the window displaying the Telnet connection to the remote computer:

[lutz@starship lutz]$ python echo-server.py
Server connected by ('166.93.68.61', 1301)
Server connected by ('166.93.68.61', 1302)
Server connected by ('166.93.68.61', 1308)
Server connected by ('166.93.68.61', 1309)
Server connected by ('166.93.68.61', 1313)
Server connected by ('166.93.68.61', 1314)
Server connected by ('166.93.68.61', 1307)
Server connected by ('166.93.68.61', 1312)

Keep in mind, however, that this works for our simple scripts only because the server doesn't take a long time to respond to each client's requests -- it can get back to the top of the server script's outer while loop in time to process the next incoming client. If it could not, we would probably need to change the server to handle each client in parallel, or some might be denied a connection. Technically, client connections would fail after five clients are already waiting for the server's attention, as specified in the server's listen call. We'll see how servers can handle multiple clients robustly in the next section.

10.3.1.7 Talking to reserved ports

It's also important to know that this client and server engage in a proprietary sort of discussion, and so use a port number 50007 outside the range reserved for standard protocols (0-1023). There's nothing preventing a client from opening a socket on one of these special ports, however. For instance, the following client-side code connects to programs listening on the standard email, FTP, and HTTP web server ports on three different server machines:

C:\...\PP2E\Internet\Sockets>python
>>> from socket import *
>>> sock = socket(AF_INET, SOCK_STREAM)
>>> sock.connect(('mail.rmi.net', 110))         # talk to RMI POP mail server
>>> print sock.recv(40)
+OK Cubic Circle's v1.31 1998/05/13 POP3
>>> sock.close()

>>> sock = socket(AF_INET, SOCK_STREAM)
>>> sock.connect(('www.python.org', 21))        # talk to Python FTP server
>>> print sock.recv(40)
220 python.org FTP server (Version wu-2.
>>> sock.close()

>>> sock = socket(AF_INET, SOCK_STREAM)
>>> sock.connect(('starship.python.net', 80))   # starship HTTP web server
>>> sock.send('GET /\r\n')                      # fetch root web page
7
>>> sock.recv(60)
'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">\012<HTM'
>>> sock.recv(60)
'L>\012 <HEAD>\012  <TITLE>Starship Slowly Recovering</TITLE>\012 </HE'

If we know how to interpret the output returned by these ports' servers, we could use raw sockets like this to fetch email, transfer files, and grab web pages and invoke server-side scripts. Fortunately, though, we don't have to worry about all the underlying details -- Python's poplib, ftplib, httplib, and urllib modules provide higher-level interfaces for talking to servers on these ports. Other Python protocol modules do the same for other standard ports (e.g., NNTP, Telnet, and so on). We'll meet some of these client-side protocol modules in the next chapter.[5]

By the way, it's okay to open client-side connections on reserved ports like this, but you can't install your own server-side scripts for these ports unless you have special permission:

[lutz@starship uploads]$ python
>>> from socket import *
>>> sock = socket(AF_INET, SOCK_STREAM)
>>> sock.bind(('', 80))
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
socket.error: (13, 'Permission denied')

Even if run by a user with the required permission, you'll get the different exception we saw earlier if the port is already being used by a real web server. On computers being used as general servers, these ports really are reserved.

10.4 Handling Multiple Clients

The echo client and server programs shown previously serve to illustrate socket fundamentals. But the server model suffers from a fairly major flaw: if multiple clients try to connect to the server, and it takes a long time to process a given clients' request, the server will fail. More accurately, if the cost of handling a given request prevents the server from returning to the code that checks for new clients in a timely manner, it won't be able to keep up with all the requests, and some clients will eventually be denied connections.

In real-world client/server programs, it's far more typical to code a server so as to avoid blocking new requests while handling a current client's request. Perhaps the easiest way to do so is to service each client's request in parallel -- in a new process, in a new thread, or by manually switching (multiplexing) between clients in an event loop. This isn't a socket issue per se, and we've already learned how to start processes and threads in Chapter 3. But since these schemes are so typical of socket server programming, let's explore all three ways to handle client requests in parallel here.

10.4.1 Forking Servers

The script in Example 10-4 works like the original echo server, but instead forks a new process to handle each new client connection. Because the handleClient function runs in a new process, the dispatcher function can immediately resume its main loop, to detect and service a new incoming request.

Example 10-4. PP2E\Internet\Sockets\fork-server.py
#########################################################
# Server side: open a socket on a port, listen for
# a message from a client, and send an echo reply; 
# forks a process to handle each client connection;
# child processes share parent's socket descriptors;
# fork is less portable than threads--not yet on Windows;
#########################################################

import os, time, sys
from socket import *                      # get socket constructor and constants
myHost = ''                               # server machine, '' means local host
myPort = 50007                            # listen on a non-reserved port number

sockobj = socket(AF_INET, SOCK_STREAM)           # make a TCP socket object
sockobj.bind((myHost, myPort))                   # bind it to server port number
sockobj.listen(5)                                # allow 5 pending connects

def now():                                       # current time on server
    return time.ctime(time.time())

activeChildren = []
def reapChildren():                              # reap any dead child processes
    while activeChildren:                        # else may fill up system table
        pid,stat = os.waitpid(0, os.WNOHANG)     # don't hang if no child exited
        if not pid: break
        activeChildren.remove(pid)

def handleClient(connection):                    # child process: reply, exit
    time.sleep(5)                                # simulate a blocking activity
    while 1:                                     # read, write a client socket
        data = connection.recv(1024)             # till eof when socket closed
        if not data: break
        connection.send('Echo=>%s at %s' % (data, now()))
    connection.close() 
    os._exit(0)

def dispatcher():                                # listen until process killed
    while 1:                                     # wait for next connection,
        connection, address = sockobj.accept()   # pass to process for service
        print 'Server connected by', address,
        print 'at', now()
        reapChildren()                           # clean up exited children now
        childPid = os.fork()                     # copy this process
        if childPid == 0:                        # if in child process: handle
            handleClient(connection)
        else:                                    # else: go accept next connect
            activeChildren.append(childPid)      # add to active child pid list

dispatcher()
10.4.1.1 Running the forking server

Parts of this script are a bit tricky, and most of its library calls work only on Unix-like platforms (not Windows). But before we get into too many details, let's start up our server and handle a few client requests. First off, notice that to simulate a long-running operation (e.g., database updates, other network traffic), this server adds a five-second time.sleep delay in its client handler function, handleClient. After the delay, the original echo reply action is performed. That means that when we run a server and clients this time, clients won't receive the echo reply until five seconds after they've sent their requests to the server.

To help keep track of requests and replies, the server prints its system time each time a client connect request is received, and adds its system time to the reply. Clients print the reply time sent back from the server, not their own -- clocks on the server and client may differ radically, so to compare apples to apples, all times are server times. Because of the simulated delays, we also usually must start each client in its own console window on Windows (on some platforms, clients will hang in a blocked state while waiting for their reply).

But the grander story here is that this script runs one main parent process on the server machine, which does nothing but watch for connections (in dispatcher), plus one child process per active client connection, running in parallel with both the main parent process and the other client processes (in handleClient). In principle, the server can handle any number of clients without bogging down. To test, let's start the server remotely in a Telnet window, and start three clients locally in three distinct console windows:

[server telnet window]
[lutz@starship uploads]$ uname -a 
Linux starship ...
[lutz@starship uploads]$ python fork-server.py 
Server connected by ('38.28.162.194', 1063) at Sun Jun 18 19:37:49 2000
Server connected by ('38.28.162.194', 1064) at Sun Jun 18 19:37:49 2000
Server connected by ('38.28.162.194', 1067) at Sun Jun 18 19:37:50 2000

 [client window 1]
C:\...\PP2E\Internet\Sockets>python echo-client.py starship.python.net 
Client received: 'Echo=>Hello network world at Sun Jun 18 19:37:54 2000'

 [client window 2]
C:\...\PP2E\Internet\Sockets>python echo-client.py starship.python.net Bruce 
Client received: 'Echo=>Bruce at Sun Jun 18 19:37:54 2000'

 [client window 3]
C:\...\PP2E\Internet\Sockets>python echo-client.py starship.python.net The 
Meaning of Life 
Client received: 'Echo=>The at Sun Jun 18 19:37:55 2000'
Client received: 'Echo=>Meaning at Sun Jun 18 19:37:56 2000'
Client received: 'Echo=>of at Sun Jun 18 19:37:56 2000'
Client received: 'Echo=>Life at Sun Jun 18 19:37:56 2000'

Again, all times here are on the server machine. This may be a little confusing because there are four windows involved. In English, the test proceeds as follows:

  1. The server starts running remotely.

  2. All three clients are started and connect to the server at roughly the same time.

  3. On the server, the client requests trigger three forked child processes, which all immediately go to sleep for five seconds (to simulate being busy doing something useful).

  4. Each client waits until the server replies, which eventually happens five seconds after their initial requests.

In other words, all three clients are serviced at the same time, by forked processes, while the main parent process continues listening for new client requests. If clients were not handled in parallel like this, no client could connect until the currently connected client's five-second delay expired.

In a more realistic application, that delay could be fatal if many clients were trying to connect at once -- the server would be stuck in the action we're simulating with time.sleep, and not get back to the main loop to accept new client requests. With process forks per request, all clients can be serviced in parallel.

Notice that we're using the same client script here (echo-client.py), just a different server; clients simply send and receive data to a machine and port, and don't care how their requests are handled on the server. Also note that the server is running remotely on a Linux machine. (As we learned in Chapter 3, the fork call is not supported on Windows in Python at the time this book was written.) We can also run this test on a Linux server entirely, with two Telnet windows. It works about the same as when clients are started locally, in a DOS console window, but here "local" means a remote machine you're telneting to locally:

 [one telnet window]
[lutz@starship uploads]$ python fork-server.py & 
[1] 3379
Server connected by ('127.0.0.1', 2928) at Sun Jun 18 22:44:50 2000
Server connected by ('127.0.0.1', 2929) at Sun Jun 18 22:45:08 2000
Server connected by ('208.185.174.112', 2930) at Sun Jun 18 22:45:50 2000

 [another telnet window, same machine]
[lutz@starship uploads]$ python echo-client.py 
Client received: 'Echo=>Hello network world at Sun Jun 18 22:44:55 2000'

[lutz@starship uploads]$ python echo-client.py localhost niNiNI 
Client received: 'Echo=>niNiNI at Sun Jun 18 22:45:13 2000'

[lutz@starship uploads]$ python echo-client.py starship.python.net Say no More! 
Client received: 'Echo=>Say at Sun Jun 18 22:45:55 2000'
Client received: 'Echo=>no at Sun Jun 18 22:45:55 2000'
Client received: 'Echo=>More! at Sun Jun 18 22:45:55 2000'

Now let's move on to the tricky bits. This server script is fairly straightforward as forking code goes, but a few comments about some of the library tools it employs are in order.

10.4.1.2 Forking processes

We met os.fork in Chapter 3, but recall that forked processes are essentially a copy of the process that forks them, and so they inherit file and socket descriptors from their parent process. Because of that, the new child process that runs the handleClient function has access to the connection socket created in the parent process. Programs know they are in a forked child process if the fork call returns 0; otherwise, the original parent process gets back the new child's ID.

10.4.1.3 Exiting from children

In earlier fork examples, child processes usually call one of the exec variants to start a new program in the child process. Here, instead, the child process simply calls a function in the same program and exits with os._exit. It's imperative to call os._exit here -- if we did not, each child would live on after handleClient returns, and compete for accepting new client requests.

In fact, without the exit call, we'd wind up with as many perpetual server processes as requests served -- remove the exit call and do a ps shell command after running a few clients, and you'll see what I mean. With the call, only the single parent process listens for new requests. os._exit is like sys.exit, but it exits the calling process immediately without cleanup actions. It's normally only used in child processes, and sys.exit is used everywhere else.

10.4.1.4 Killing the zombies

Note, however, that it's not quite enough to make sure that child processes exit and die. On systems like Linux, parents must also be sure to issue a wait system call to remove the entries for dead child processes from the system's process table. If we don't, the child processes will no longer run, but they will consume an entry in the system process table. For long-running servers, these bogus entries may become problematic.

It's common to call such dead-but-listed child processes "zombies": they continue to use system resources even though they've already passed over to the great operating system beyond. To clean up after child processes are gone, this server keeps a list, activeChildren, of the process IDs of all child processes it spawns. Whenever a new incoming client request is received, the server runs its reapChildren to issue a wait for any dead children by issuing the standard Python os.waitpid(0,os.WNOHANG) call.

The os.waitpid call attempts to wait for a child process to exit and returns its process ID and exit status. With a for its first argument, it waits for any child process. With the WNOHANG parameter for its second, it does nothing if no child process has exited (i.e., it does not block or pause the caller). The net effect is that this call simply asks the operating system for the process ID of any child that has exited. If any have, the process ID returned is removed both from the system process table and from this script's activeChildren list.

To see why all this complexity is needed, comment out the reapChildren call in this script, run it on a server, and then run a few clients. On my Linux server, a ps -f full process listing command shows that all the dead child processes stay in the system process table (show as <defunct>):

[lutz@starship uploads]$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
lutz      3270  3264  0 22:33 pts/1    00:00:00 -bash
lutz      3311  3270  0 22:37 pts/1    00:00:00 python fork-server.py
lutz      3312  3311  0 22:37 pts/1    00:00:00 [python <defunct>]
lutz      3313  3311  0 22:37 pts/1    00:00:00 [python <defunct>]
lutz      3314  3311  0 22:37 pts/1    00:00:00 [python <defunct>]
lutz      3316  3311  0 22:37 pts/1    00:00:00 [python <defunct>]
lutz      3317  3311  0 22:37 pts/1    00:00:00 [python <defunct>]
lutz      3318  3311  0 22:37 pts/1    00:00:00 [python <defunct>]
lutz      3322  3270  0 22:38 pts/1    00:00:00 ps -f

When the reapChildren command is reactivated, dead child zombie entries are cleaned up each time the server gets a new client connection request, by calling the Python os.waitpid function. A few zombies may accumulate if the server is heavily loaded, but will remain only until the next client connection is received:

[lutz@starship uploads]$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
lutz      3270  3264  0 22:33 pts/1    00:00:00 -bash
lutz      3340  3270  0 22:41 pts/1    00:00:00 python fork-server.py
lutz      3341  3340  0 22:41 pts/1    00:00:00 [python <defunct>]
lutz      3342  3340  0 22:41 pts/1    00:00:00 [python <defunct>]
lutz      3343  3340  0 22:41 pts/1    00:00:00 [python <defunct>]
lutz      3344  3270  6 22:41 pts/1    00:00:00 ps -f
[lutz@starship uploads]$ 
Server connected by ('38.28.131.174', 1170) at Sun Jun 18 22:41:43 2000

[lutz@starship uploads]$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
lutz      3270  3264  0 22:33 pts/1    00:00:00 -bash
lutz      3340  3270  0 22:41 pts/1    00:00:00 python fork-server.py
lutz      3345  3340  0 22:41 pts/1    00:00:00 [python <defunct>]
lutz      3346  3270  0 22:41 pts/1    00:00:00 ps -f

If you type fast enough, you can actually see a child process morph from a real running program into a zombie. Here, for example, a child spawned to handle a new request (process ID 11785) changes to <defunct> on exit. Its process entry will be removed completely when the next request is received:

[lutz@starship uploads]$ 
Server connected by ('38.28.57.160', 1106) at Mon Jun 19 22:34:39 2000
[lutz@starship uploads]$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
lutz     11089 11088  0 21:13 pts/2    00:00:00 -bash
lutz     11780 11089  0 22:34 pts/2    00:00:00 python fork-server.py
lutz     11785 11780  0 22:34 pts/2    00:00:00 python fork-server.py
lutz     11786 11089  0 22:34 pts/2    00:00:00 ps -f

[lutz@starship uploads]$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
lutz     11089 11088  0 21:13 pts/2    00:00:00 -bash
lutz     11780 11089  0 22:34 pts/2    00:00:00 python fork-server.py
lutz     11785 11780  0 22:34 pts/2    00:00:00 [python <defunct>]
lutz     11787 11089  0 22:34 pts/2    00:00:00 ps -f
10.4.1.5 Preventing zombies with signal handlers

On some systems, it's also possible to clean up zombie child processes by resetting the signal handler for the SIGCHLD signal raised by the operating system when a child process exits. If a Python script assigns the SIG_IGN (ignore) action as the SIGCHLD signal handler, zombies will be removed automatically and immediately as child processes exit; the parent need not issue wait calls to clean up after them. Because of that, this scheme is a simpler alternative to manually reaping zombies (on platforms where it is supported).

If you've already read Chapter 3, you know that Python's standard signal module lets scripts install handlers for signals -- software-generated events. If you haven't read that chapter, here is a brief bit of background to show how this pans out for zombies. The program in Example 10-5 installs a Python-coded signal handler function to respond to whatever signal number you type on the command line.

Example 10-5. PP2E\Internet\Sockets\signal-demo.py
##########################################################
# Demo Python's signal module; pass signal number as a
# command-line arg, use a "kill -N pid" shell command
# to send this process a signal; e.g., on my linux 
# machine, SIGUSR1=10, SIGUSR2=12, SIGCHLD=17, and 
# SIGCHLD handler stays in effect even if not restored:
# all other handlers restored by Python after caught,
# but SIGCHLD is left to the platform's implementation;
# signal works on Windows but defines only a few signal
# types; signals are not very portable in general;
##########################################################

import sys, signal, time

def now():
    return time.ctime(time.time())

def onSignal(signum, stackframe):                # python signal handler
    print 'Got signal', signum, 'at', now()      # most handlers stay in effect
    if signum == signal.SIGCHLD:                 # but sigchld handler is not 
        print 'sigchld caught'
        #signal.signal(signal.SIGCHLD, onSignal)

signum = int(sys.argv[1])
signal.signal(signum, onSignal)                  # install signal handler
while 1: signal.pause()                          # sleep waiting for signals

To run this script, simply put it in the background and send it signals by typing the kill -signal-number process-id shell command line. Process IDs are listed in the PID column of ps command results. Here is this script in action catching signal numbers 10 (reserved for general use) and 9 (the unavoidable terminate signal):

[lutz@starship uploads]$ python signal-demo.py 10 &
[1] 11297
[lutz@starship uploads]$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
lutz     11089 11088  0 21:13 pts/2    00:00:00 -bash
lutz     11297 11089  0 21:49 pts/2    00:00:00 python signal-demo.py 10
lutz     11298 11089  0 21:49 pts/2    00:00:00 ps -f

[lutz@starship uploads]$ kill -10 11297
Got signal 10 at Mon Jun 19 21:49:27 2000

[lutz@starship uploads]$ kill -10 11297
Got signal 10 at Mon Jun 19 21:49:29 2000

[lutz@starship uploads]$ kill -10 11297
Got signal 10 at Mon Jun 19 21:49:32 2000

[lutz@starship uploads]$ kill -9 11297
[1]+  Killed                  python signal-demo.py 10

And here the script catches signal 17, which happens to be SIGCHLD on my Linux server. Signal numbers vary from machine to machine, so you should normally use their names, not their numbers. SIGCHLD behavior may vary per platform as well (see the signal module's library manual entry for more details):

[lutz@starship uploads]$ python signal-demo.py 17 &
[1] 11320
[lutz@starship uploads]$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
lutz     11089 11088  0 21:13 pts/2    00:00:00 -bash
lutz     11320 11089  0 21:52 pts/2    00:00:00 python signal-demo.py 17
lutz     11321 11089  0 21:52 pts/2    00:00:00 ps -f

[lutz@starship uploads]$ kill -17 11320
Got signal 17 at Mon Jun 19 21:52:24 2000
[lutz@starship uploads] sigchld caught

[lutz@starship uploads]$ kill -17 11320
Got signal 17 at Mon Jun 19 21:52:27 2000
[lutz@starship uploads]$ sigchld caught

Now, to apply all this to kill zombies, simply set the SIGCHLD signal handler to the SIG_IGN ignore handler action; on systems where this assignment is supported, child processes will be cleaned up when they exit. The forking server variant shown in Example 10-6 uses this trick to manage its children.

Example 10-6. PP2E\Internet\Sockets\fork-server-signal.py
#########################################################
# Same as fork-server.py, but use the Python signal
# module to avoid keeping child zombie processes after
# they terminate, not an explicit loop before each new 
# connection; SIG_IGN means ignore, and may not work with
# SIG_CHLD child exit signal on all platforms; on Linux,
# socket.accept cannot be interrupted with a signal;
#########################################################

import os, time, sys, signal, signal
from socket import *                      # get socket constructor and constants
myHost = ''                               # server machine, '' means local host
myPort = 50007                            # listen on a non-reserved port number

sockobj = socket(AF_INET, SOCK_STREAM)           # make a TCP socket object
sockobj.bind((myHost, myPort))                   # bind it to server port number
sockobj.listen(5)                                # up to 5 pending connects
signal.signal(signal.SIGCHLD, signal.SIG_IGN)    # avoid child zombie processes

def now():                                       # time on server machine
    return time.ctime(time.time())

def handleClient(connection):                    # child process replies, exits
    time.sleep(5)                                # simulate a blocking activity
    while 1:                                     # read, write a client socket
        data = connection.recv(1024)
        if not data: break  
        connection.send('Echo=>%s at %s' % (data, now()))
    connection.close() 
    os._exit(0)

def dispatcher():                                # listen until process killed
    while 1:                                     # wait for next connection,
        connection, address = sockobj.accept()   # pass to process for service
        print 'Server connected by', address,
        print 'at', now()
        childPid = os.fork()                     # copy this process
        if childPid == 0:                        # if in child process: handle
            handleClient(connection)             # else: go accept next connect

dispatcher()

Where applicable, this technique is:

In fact, there is really only one line dedicated to handling zombies here: the signal.signal call near the top, to set the handler. Unfortunately, this version is also even less portable than using os.fork in the first place, because signals may work slightly different from platform to platform. For instance, some platforms may not allow SIG_IGN to be used as the SIGCHLD action at all. On Linux systems, though, this simpler forking server variant works like a charm:

[lutz@starship uploads]$ 
Server connected by ('38.28.57.160', 1166) at Mon Jun 19 22:38:29 2000

[lutz@starship uploads]$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
lutz     11089 11088  0 21:13 pts/2    00:00:00 -bash
lutz     11827 11089  0 22:37 pts/2    00:00:00 python fork-server-signal.py
lutz     11835 11827  0 22:38 pts/2    00:00:00 python fork-server-signal.py
lutz     11836 11089  0 22:38 pts/2    00:00:00 ps -f

[lutz@starship uploads]$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
lutz     11089 11088  0 21:13 pts/2    00:00:00 -bash
lutz     11827 11089  0 22:37 pts/2    00:00:00 python fork-server-signal.py
lutz     11837 11089  0 22:38 pts/2    00:00:00 ps -f

Notice that in this version, the child process's entry goes away as soon as it exits, even before a new client request is received; no "defunct" zombie ever appears. More dramatically, if we now start up the script we wrote earlier that spawns eight clients in parallel (testecho.py) to talk to this server, all appear on the server while running, but are removed immediately as they exit:

[lutz@starship uploads]$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
lutz     11089 11088  0 21:13 pts/2    00:00:00 -bash
lutz     11827 11089  0 22:37 pts/2    00:00:00 python fork-server-signal.py
lutz     11839 11827  0 22:39 pts/2    00:00:00 python fork-server-signal.py
lutz     11840 11827  0 22:39 pts/2    00:00:00 python fork-server-signal.py
lutz     11841 11827  0 22:39 pts/2    00:00:00 python fork-server-signal.py
lutz     11842 11827  0 22:39 pts/2    00:00:00 python fork-server-signal.py
lutz     11843 11827  0 22:39 pts/2    00:00:00 python fork-server-signal.py
lutz     11844 11827  0 22:39 pts/2    00:00:00 python fork-server-signal.py
lutz     11845 11827  0 22:39 pts/2    00:00:00 python fork-server-signal.py
lutz     11846 11827  0 22:39 pts/2    00:00:00 python fork-server-signal.py
lutz     11848 11089  0 22:39 pts/2    00:00:00 ps -f

[lutz@starship uploads]$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
lutz     11089 11088  0 21:13 pts/2    00:00:00 -bash
lutz     11827 11089  0 22:37 pts/2    00:00:00 python fork-server-signal.py
lutz     11849 11089  0 22:39 pts/2    00:00:00 ps -f

10.4.2 Threading Servers

But don't do that . The forking model just described works well on some platforms in general, but suffers from some potentially big limitations:

Performance

On some machines, starting a new process can be fairly expensive in terms of time and space resources.

Portability

Forking processes is a Unix device; as we just noted, the fork call currently doesn't work on non-Unix platforms such as Windows.

Complexity

If you think that forking servers can be complicated, you're right. As we just saw, forking also brings with it all the shenanigans of managing zombies -- cleaning up after child processes that live shorter lives than their parents.

If you read Chapter 3, you know that the solution to all of these dilemmas is usually to use threads instead of processes. Threads run in parallel and share global (i.e., module and interpreter) memory, but they are usually less expensive to start, and work both on Unix-like machines and Microsoft Windows today. Furthermore, threads are simpler to program -- child threads die silently on exit, without leaving behind zombies to haunt the server.

Example 10-7 is another mutation of the echo server that handles client request in parallel by running them in threads, rather than processes.

Example 10-7. PP2E\Internet\Sockets\thread-server.py
#########################################################
# Server side: open a socket on a port, listen for
# a message from a client, and send an echo reply; 
# echos lines until eof when client closes socket;
# spawns a thread to handle each client connection;
# threads share global memory space with main thread;
# this is more portable than fork--not yet on Windows;
#########################################################

import thread, time
from socket import *                     # get socket constructor and constants
myHost = ''                              # server machine, '' means local host
myPort = 50007                           # listen on a non-reserved port number

sockobj = socket(AF_INET, SOCK_STREAM)           # make a TCP socket object
sockobj.bind((myHost, myPort))                   # bind it to server port number
sockobj.listen(5)                                # allow up to 5 pending connects

def now(): 
    return time.ctime(time.time())               # current time on the server

def handleClient(connection):                    # in spawned thread: reply
    time.sleep(5)                                # simulate a blocking activity
    while 1:                                     # read, write a client socket
        data = connection.recv(1024)
        if not data: break
        connection.send('Echo=>%s at %s' % (data, now()))
    connection.close() 

def dispatcher():                                # listen until process killd
    while 1:                                     # wait for next connection,
        connection, address = sockobj.accept()   # pass to thread for service
        print 'Server connected by', address,
        print 'at', now() 
        thread.start_new(handleClient, (connection,))

dispatcher()

This dispatcher delegates each incoming client connection request to a newly spawned thread running the handleClient function. Because of that, this server can process multiple clients at once, and the main dispatcher loop can get quickly back to the top to check for newly arrived requests. The net effect is that new clients won't be denied service due to a busy server.

Functionally, this version is similar to the fork solution (clients are handled in parallel), but it will work on any machine that supports threads, including Windows and Linux. Let's test it on both. First, start the server on a Linux machine and run clients on both Linux and Windows:

 [window 1: thread-based server process, server keeps accepting 
 client connections while threads are servicing prior requests]
[lutz@starship uploads]$ /usr/bin/python thread-server.py 
Server connected by ('127.0.0.1', 2934) at Sun Jun 18 22:52:52 2000
Server connected by ('38.28.131.174', 1179) at Sun Jun 18 22:53:31 2000
Server connected by ('38.28.131.174', 1182) at Sun Jun 18 22:53:35 2000
Server connected by ('38.28.131.174', 1185) at Sun Jun 18 22:53:37 2000

 [window 2: client, but on same server machine]
[lutz@starship uploads]$ python echo-client.py 
Client received: 'Echo=>Hello network world at Sun Jun 18 22:52:57 2000'

 [window 3: remote client, PC]
C:\...\PP2E\Internet\Sockets>python echo-client.py starship.python.net
Client received: 'Echo=>Hello network world at Sun Jun 18 22:53:36 2000'

 [window 4: client PC]
C:\...\PP2E\Internet\Sockets>python echo-client.py starship.python.net Bruce 
Client received: 'Echo=>Bruce at Sun Jun 18 22:53:40 2000'

 [window 5: client PC]
C:\...\PP2E\Internet\Sockets>python echo-client.py starship.python.net The
Meaning of Life 
Client received: 'Echo=>The at Sun Jun 18 22:53:42 2000'
Client received: 'Echo=>Meaning at Sun Jun 18 22:53:42 2000'
Client received: 'Echo=>of at Sun Jun 18 22:53:42 2000'
Client received: 'Echo=>Life at Sun Jun 18 22:53:42 2000'

Because this server uses threads instead of forked processes, we can run it portably on both Linux and a Windows PC. Here it is at work again, running on the same local Windows PC as its clients; again, the main point to notice is that new clients are accepted while prior clients are being processed in parallel with other clients and the main thread (in the five-second sleep delay):

 [window 1: server, on local PC]
C:\...\PP2E\Internet\Sockets>python thread-server.py 
Server connected by ('127.0.0.1', 1186) at Sun Jun 18 23:46:31 2000
Server connected by ('127.0.0.1', 1187) at Sun Jun 18 23:46:33 2000
Server connected by ('127.0.0.1', 1188) at Sun Jun 18 23:46:34 2000

 [window 2: client, on local
PC]
C:\...\PP2E\Internet\Sockets>python echo-client.py 
Client received: 'Echo=>Hello network world at Sun Jun 18 23:46:36 2000'

 [window 3: client]
C:\...\PP2E\Internet\Sockets>python echo-client.py localhost Brian 
Client received: 'Echo=>Brian at Sun Jun 18 23:46:38 2000'

 [window 4: client]
C:\...\PP2E\Internet\Sockets>python echo-client.py localhost Bright side of Life 
Client received: 'Echo=>Bright at Sun Jun 18 23:46:39 2000'
Client received: 'Echo=>side at Sun Jun 18 23:46:39 2000'
Client received: 'Echo=>of at Sun Jun 18 23:46:39 2000'
Client received: 'Echo=>Life at Sun Jun 18 23:46:39 2000'

Recall that a thread silently exits when the function it is running returns; unlike the process fork version, we don't call anything like os._exit in the client handler function (and we shouldn't -- it may kill all threads in the process!). Because of this, the thread version is not only more portable, but is also simpler.

10.4.3 Doing It with Classes: Server Frameworks

Now that I've shown you how to write forking and threading servers to process clients without blocking incoming requests, I should also tell you that there are standard tools in the Python library to make this process easier. In particular, the SocketServer module defines classes that implement all flavors of forking and threading servers that you are likely to be interested in. Simply create the desired kind of imported server object, passing in a handler object with a callback method of your own, as shown in Example 10-8.

Example 10-8. PP2E\Internet\Sockets\class-server.py
#########################################################
# Server side: open a socket on a port, listen for
# a message from a client, and send an echo reply; 
# this version uses the standard library module 
# SocketServer to do its work; SocketServer allows
# us to make a simple TCPServer, a ThreadingTCPServer,
# a ForkingTCPServer, and more, and routes each client
# connect request to a new instance of a passed-in 
# request handler object's handle method; also supports
# UDP and Unix domain sockets; see the library manual.
#########################################################

import SocketServer, time               # get socket server, handler objects
myHost = ''                             # server machine, '' means local host
myPort = 50007                          # listen on a non-reserved port number
def now(): 
    return time.ctime(time.time())

class MyClientHandler(SocketServer.BaseRequestHandler):
    def handle(self):                           # on each client connect
        print self.client_address, now()        # show this client's address
        time.sleep(5)                           # simulate a blocking activity
        while 1:                                # self.request is client socket
            data = self.request.recv(1024)      # read, write a client socket
            if not data: break
            self.request.send('Echo=>%s at %s' % (data, now()))
        self.request.close() 

# make a threaded server, listen/handle clients forever
myaddr = (myHost, myPort)
server = SocketServer.ThreadingTCPServer(myaddr, MyClientHandler)
server.serve_forever()   

This server works the same as the threading server we wrote by hand in the previous section, but instead focuses on service implementation (the customized handle method), not on threading details. It's run the same way, too -- here it is processing three clients started by hand, plus eight spawned by the testecho script shown in Example 10-3:

 [window1: server, serverHost='localhost' in echo-client.py]
C:\...\PP2E\Internet\Sockets>python class-server.py 
('127.0.0.1', 1189) Sun Jun 18 23:49:18 2000
('127.0.0.1', 1190) Sun Jun 18 23:49:20 2000
('127.0.0.1', 1191) Sun Jun 18 23:49:22 2000
('127.0.0.1', 1192) Sun Jun 18 23:49:50 2000
('127.0.0.1', 1193) Sun Jun 18 23:49:50 2000
('127.0.0.1', 1194) Sun Jun 18 23:49:50 2000
('127.0.0.1', 1195) Sun Jun 18 23:49:50 2000
('127.0.0.1', 1196) Sun Jun 18 23:49:50 2000
('127.0.0.1', 1197) Sun Jun 18 23:49:50 2000
('127.0.0.1', 1198) Sun Jun 18 23:49:50 2000
('127.0.0.1', 1199) Sun Jun 18 23:49:50 2000

 [window2: client]
C:\...\PP2E\Internet\Sockets>python echo-client.py 
Client received: 'Echo=>Hello network world at Sun Jun 18 23:49:23 2000'

 [window3: client]
C:\...\PP2E\Internet\Sockets>python echo-client.py localhost Robin 
Client received: 'Echo=>Robin at Sun Jun 18 23:49:25 2000'

 [window4: client]
C:\...\PP2E\Internet\Sockets>python echo-client.py localhost Brave Sir Robin 
Client received: 'Echo=>Brave at Sun Jun 18 23:49:27 2000'
Client received: 'Echo=>Sir at Sun Jun 18 23:49:27 2000'
Client received: 'Echo=>Robin at Sun Jun 18 23:49:27 2000'

C:\...\PP2E\Internet\Sockets>python testecho.py 

 [window4: contact remote server instead -- times skewed]
C:\...\PP2E\Internet\Sockets>python echo-client.py starship.python.net Brave
Sir Robin 
Client received: 'Echo=>Brave at Sun Jun 18 23:03:28 2000'
Client received: 'Echo=>Sir at Sun Jun 18 23:03:28 2000'
Client received: 'Echo=>Robin at Sun Jun 18 23:03:29 2000'

To build a forking server instead, just use class name ForkingTCPServer when creating the server object. The SocketServer module is more powerful than shown by this example; it also supports synchronous (nonparallel) servers, UDP and Unix sockets, and so on. See Python's library manual for more details. Also see the end of Chapter 15 for more on Python server implementation tools.[6]

10.4.4 Multiplexing Servers with select

So far we've seen how to handle multiple clients at once with both forked processes and spawned threads, and we've looked at a library class that encapsulates both schemes. Under both approaches, all client handlers seem to run in parallel with each other and with the main dispatch loop that continues watching for new incoming requests. Because all these tasks run in parallel (i.e., at the same time), the server doesn't get blocked when accepting new requests or when processing a long-running client handler.

Technically, though, threads and processes don't really run in parallel, unless you're lucky enough to have a machine with arbitrarily many CPUs. Instead, your operating system performs a juggling act -- it divides the computer's processing power among all active tasks. It runs part of one, then part of another, and so on. All the tasks appear to run in parallel, but only because the operating system switches focus between tasks so fast that you don't usually notice. This process of switching between tasks is sometimes called time-slicing when done by an operating system; it is more generally known as multiplexing.

When we spawn threads and processes, we rely on the operating system to juggle the active tasks, but there's no reason that a Python script can't do so as well. For instance, a script might divide tasks into multiple steps -- do a step of one task, then one of another, and so on, until all are completed. The script need only know how to divide its attention among the multiple active tasks to multiplex on its own.

Servers can apply this technique to yield yet another way to handle multiple clients at once, a way that requires neither threads nor forks. By multiplexing client connections and the main dispatcher with the select system call, a single event loop can process clients and accept new ones in parallel (or at least close enough to avoid stalling). Such servers are sometimes call asynchronous, because they service clients in spurts, as each becomes ready to communicate. In asynchronous servers, a single main loop run in a single process and thread decides which clients should get a bit of attention each time through. Client requests and the main dispatcher are each given a small slice of the server's attention if they are ready to converse.

Most of the magic behind this server structure is the operating system select call, available in Python's standard select module. Roughly, select is asked to monitor a list of input sources, output sources, and exceptional condition sources, and tells us which sources are ready for processing. It can be made to simply poll all the sources to see which are ready, wait for a maximum time period for sources to become ready, or wait indefinitely until one or more sources are ready for processing.

However used, select lets us direct attention to sockets ready to communicate, so as to avoid blocking on calls to ones that are not. That is, when the sources passed to select are sockets, we can be sure that socket calls like accept, recv, and send will not block (pause) the server when applied to objects returned by select. Because of that, a single-loop server that uses select need not get stuck communicating with one client or waiting for new ones, while other clients are starved for the server's attention.

10.4.4.1 A select-based echo server

Let's see how all this translates into code. The script in Example 10-9 implements another echo server, one that can handle multiple clients without ever starting new processes or threads.

Example 10-9. PP2E\Internet\Sockets\select-server.py
#################################################################
# Server: handle multiple clients in parallel with select.
# use the select module to multiplex among a set of sockets:
# main sockets which accept new client connections, and 
# input sockets connected to accepted clients; select can
# take an optional 4th arg--0 to poll, n.m to wait n.m secs, 
# ommitted to wait till any socket is ready for processing.
#################################################################

import sys, time
from select import select
from socket import socket, AF_INET, SOCK_STREAM
def now(): return time.ctime(time.time())

myHost = ''                             # server machine, '' means local host
myPort = 50007                          # listen on a non-reserved port number
if len(sys.argv) == 3:                  # allow host/port as cmdline args too
    myHost, myPort = sys.argv[1:]
numPortSocks = 2                        # number of ports for client connects

# make main sockets for accepting new client requests
mainsocks, readsocks, writesocks = [], [], []
for i in range(numPortSocks):
    portsock = socket(AF_INET, SOCK_STREAM)   # make a TCP/IP spocket object
    portsock.bind((myHost, myPort))           # bind it to server port number
    portsock.listen(5)                        # listen, allow 5 pending connects
    mainsocks.append(portsock)                # add to main list to identify
    readsocks.append(portsock)                # add to select inputs list 
    myPort = myPort + 1                       # bind on consecutive ports 

# event loop: listen and multiplex until server process killed
print 'select-server loop starting'
while 1:
    #print readsocks
    readables, writeables, exceptions = select(readsocks, writesocks, [])
    for sockobj in readables:
        if sockobj in mainsocks:                     # for ready input sockets
            # port socket: accept new client
            newsock, address = sockobj.accept()      # accept should not block
            print 'Connect:', address, id(newsock)   # newsock is a new socket
            readsocks.append(newsock)                # add to select list, wait
        else:
            # client socket: read next line
            data = sockobj.recv(1024)                # recv should not block
            print '\tgot', data, 'on', id(sockobj)
            if not data:                             # if closed by the clients 
                sockobj.close()                      # close here and remv from
                readsocks.remove(sockobj)            # del list else reselected 
            else:
                # this may block: should really select for writes too
                sockobj.send('Echo=>%s at %s' % (data, now()))

The bulk of this script is the big while event loop at the end that calls select to find out which sockets are ready for processing (these include main port sockets on which clients can connect, and open client connections). It then loops over all such ready sockets, accepting connections on main port sockets, and reading and echoing input on any client sockets ready for input. Both the accept and recv calls in this code are guaranteed to not block the server process after select returns; because of that, this server can get quickly back to the top of the loop to process newly arrived client requests and already-connected clients' inputs. The net effect is that all new requests and clients are serviced in pseudo-parallel fashion.

To make this process work, the server appends the connected socket for each client to the readables list passed to select, and simply waits for the socket to show up in the selected inputs list. For illustration purposes, this server also listens for new clients on more than one port -- on ports 50007 and 50008 in our examples. Because these main port sockets are also interrogated with select, connection requests on either port can be accepted without blocking either already-connected clients or new connection requests appearing on the other port. The select call returns whatever sockets in list readables are ready for processing -- both main port sockets and sockets connected to clients currently being processed.

10.4.4.2 Running the select server

Let's run this script locally to see how it does its stuff (the client and server can also be run on different machines, as in prior socket examples). First of all, we'll assume we've already started this server script in one window, and run a few clients to talk to it. The following code is the interaction in two such client windows running on Windows (MS-DOS consoles). The first client simply runs the echo-client script twice to contact the server, and the second also kicks off the testecho script to spawn eight echo-client programs running in parallel. As before, the server simply echoes back whatever text that clients send. Notice that the second client window really runs a script called echo-client-50008 so as to connect to the second port socket in the server; it's the same as echo-client, with a different port number (alas, the original script wasn't designed to input a port):

 [client window 1]
C:\...\PP2E\Internet\Sockets>python echo-client.py 
Client received: 'Echo=>Hello network world at Sun Aug 13 22:52:01 2000'

C:\...\PP2E\Internet\Sockets>python echo-client.py 
Client received: 'Echo=>Hello network world at Sun Aug 13 22:52:03 2000'

 [client window 2]
C:\...\PP2E\Internet\Sockets>python echo-client-50008.py localhost Sir Lancelot 
Client received: 'Echo=>Sir at Sun Aug 13 22:52:57 2000'
Client received: 'Echo=>Lancelot at Sun Aug 13 22:52:57 2000'

C:\...\PP2E\Internet\Sockets>python testecho.py 

Now, in the next code section is the sort of interaction and output that shows up in the window where the server has been started. The first three connections come from echo-client runs; the rest is the result of the eight programs spawned by testecho in the second client window. Notice that for testecho, new client connections and client inputs are all multiplexed together. If you study the output closely, you'll see that they overlap in time, because all activity is dispatched by the single event loop in the server.[7] Also note that the sever gets an empty string when the client has closed its socket. We take care to close and delete these sockets at the server right away, or else they would be needlessly reselected again and again, each time through the main loop:

 [server window]
C:\...\PP2E\Internet\Sockets>python select-server.py 
select-server loop starting
Connect: ('127.0.0.1', 1175) 7965520
        got Hello network world on 7965520
        got  on 7965520
Connect: ('127.0.0.1', 1176) 7964288
        got Hello network world on 7964288
        got  on 7964288
Connect: ('127.0.0.1', 1177) 7963920
        got Sir on 7963920
        got Lancelot on 7963920
        got  on 7963920

 [testecho results]
Connect: ('127.0.0.1', 1178) 7965216
        got Hello network world on 7965216
        got  on 7965216
Connect: ('127.0.0.1', 1179) 7963968
Connect: ('127.0.0.1', 1180) 7965424
        got Hello network world on 7963968
Connect: ('127.0.0.1', 1181) 7962976
        got Hello network world on 7965424
        got  on 7963968
        got Hello network world on 7962976
        got  on 7965424
        got  on 7962976
Connect: ('127.0.0.1', 1182) 7963648
        got Hello network world on 7963648
        got  on 7963648
Connect: ('127.0.0.1', 1183) 7966640
        got Hello network world on 7966640
        got  on 7966640
Connect: ('127.0.0.1', 1184) 7966496
        got Hello network world on 7966496
        got  on 7966496
Connect: ('127.0.0.1', 1185) 7965888
        got Hello network world on 7965888
        got  on 7965888

A subtle but crucial point: a time.sleep call to simulate a long-running task doesn't make sense in the server here -- because all clients are handled by the same single loop, sleeping would pause everything (and defeat the whole point of a multiplexing server). Here are a few additional notes before we move on:

Select call details

Formally, select is called with three lists of selectable objects (input sources, output sources, and exceptional condition sources), plus an optional timeout. The timeout argument may be a real wait expiration value in seconds (use floating-point numbers to express fractions of a second), a zero value to mean simply poll and return immediately, or be omitted to mean wait until at least one object is ready (as done in our server script earlier). The call returns a triple of ready objects -- subsets of the first three arguments -- any or all of which may be empty if the timeout expired before sources became ready.

Select portability

The select call works only for sockets on Windows, but also works for things like files and pipes on Unix and Macintosh. For servers running over the Internet, of course, sockets are the primary devices we are interested in.

Nonblocking sockets

select lets us be sure that socket calls like accept and recv won't block (pause) the caller, but it's also possible to make Python sockets nonblocking in general. Call the setblocking method of socket objects to set the socket to blocking or nonblocking mode. For example, given a call like sock.setblocking(flag), the socket sock is set to nonblocking mode if the flag is zero, and set to blocking mode otherwise. All sockets start out in blocking mode initially, so socket calls may always make the caller wait.

But when in nonblocking mode, a socket.error exception is raised if a recv socket call doesn't find any data, or if a send call can't immediately transfer data. A script can catch this exception to determine if the socket is ready for processing. In blocking mode, these calls always block until they can proceed. Of course, there may be much more to processing client requests than data transfers (requests may also require long-running computations), so nonblocking sockets don't guarantee that servers won't stall in general. They are simply another way to code multiplexing servers. Like select, they are better suited when client requests can be serviced quickly.

The asyncore module framework

If you're interested in using select, you will probably also be interested in checking out the asyncore.py module in the standard Python library. It implements a class-based callback model, where input and output callbacks are dispatched to class methods by a precoded select event loop. As such, it allows servers to be constructed without threads or forks. We'll learn more about this tool at the end of Chapter 15.

10.4.5 Choosing a Server Scheme

So when should you use select to build a server instead of threads or forks? Needs vary per application, of course, but servers based on the select call are generally considered to perform very well when client transactions are relatively short. If they are not short, threads or forks may be a better way to split processing among multiple clients. Threads and forks are especially useful if clients require long-running processing above and beyond socket calls.

It's important to remember that schemes based on select (and nonblocking sockets) are not completely immune to blocking. In the example earlier, for instance, the send call that echoes text back to a client might block, too, and hence stall the entire server. We could work around that blocking potential by using select to make sure that the output operation is ready before we attempt it (e.g., use the writesocks list and add another loop to send replies to ready output sockets), albeit at a noticeable cost in program clarity.

In general, though, if we cannot split up the processing of a client's request in such a way that it can be multiplexed with other requests and not block the server's loop, select may not be the best way to construct the server. Moreover, select also seems more complex than spawning either processes or threads, because we need to manually transfer control among all tasks (for instance, compare the threaded and select versions of this server, even without write selects). As usual, though, the degree of that complexity may vary per application.

10.5 A Simple Python File Server

Time for something more realistic. Let's conclude this chapter by putting some of these socket ideas to work in something a bit more useful than echoing text back and forth. Example 10-10 implements both the server-side and client-side logic needed to ship a requested file from server to client machines over a raw socket.

In effect, this script implements a simple file download system. One instance of the script is run on the machine where downloadable files live (the server), and another on the machines you wish to copy files to (the clients). Command-line arguments tell the script which flavor to run and optionally name the server machine and port number over which conversations are to occur. A server instance can respond to any number of client file requests at the port on which it listens, because it serves each in a thread.

Example 10-10. PP2E\Internet\Sockets\getfile.py
########################################################
# implement client and server side logic to transfer an
# arbitrary file from server to client over a socket; 
# uses a simple control-info protocol rather than 
# separate sockets for control and data (as in ftp),
# dispatches each client request to a handler thread,
# and loops to transfer the entire file by blocks; see
# ftplib examples for a higher-level transport scheme;
########################################################

import sys, os, thread, time
from socket import *
def now(): return time.ctime(time.time())

blksz = 1024
defaultHost = 'localhost'
defaultPort = 50001

helptext = """
Usage...
server=> getfile.py  -mode server            [-port nnn] [-host hhh|localhost]
client=> getfile.py [-mode client] -file fff [-port nnn] [-host hhh|localhost]
"""

def parsecommandline():
    dict = {}                        # put in dictionary for easy lookup
    args = sys.argv[1:]              # skip program name at front of args
    while len(args) >= 2:            # example: dict['-mode'] = 'server'
        dict[args[0]] = args[1]
        args = args[2:]
    return dict

def client(host, port, filename):
    sock = socket(AF_INET, SOCK_STREAM)
    sock.connect((host, port)) 
    sock.send(filename + '\n')                 # send remote name with dir
    dropdir = os.path.split(filename)[1]       # file name at end of dir path
    file = open(dropdir, 'wb')                 # create local file in cwd
    while 1:
        data = sock.recv(blksz)                # get up to 1K at a time
        if not data: break                     # till closed on server side
        file.write(data)                       # store data in local file
    sock.close()
    file.close()
    print 'Client got', filename, 'at', now()
    
def serverthread(clientsock):
    sockfile = clientsock.makefile('r')        # wrap socket in dup file obj
    filename = sockfile.readline()[:-1]        # get filename up to end-line
    try:
        file = open(filename, 'rb')
        while 1:
            bytes = file.read(blksz)           # read/send 1K at a time
            if not bytes: break                # until file totally sent
            sent = clientsock.send(bytes)
            assert sent == len(bytes)
    except:
        print 'Error downloading file on server:', filename
    clientsock.close()
        
def server(host, port):
    serversock = socket(AF_INET, SOCK_STREAM)     # listen on tcp/ip socket
    serversock.bind((host, port))                 # serve clients in threads
    serversock.listen(5)
    while 1:
        clientsock, clientaddr = serversock.accept()
        print 'Server connected by', clientaddr, 'at', now()
        thread.start_new_thread(serverthread, (clientsock,))
        
def main(args):
    host = args.get('-host', defaultHost)         # use args or defaults
    port = int(args.get('-port', defaultPort))    # is a string in argv
    if args.get('-mode') == 'server':             # None if no -mode: client
        if host == 'localhost': host = ''         # else fails remotely
        server(host, port)
    elif args.get('-file'):                       # client mode needs -file
        client(host, port, args['-file'])
    else:
        print helptext

if __name__ == '__main__':
    args = parsecommandline()
    main(args)

This script doesn't do much different than the examples we saw earlier. Depending on the command-line arguments passed, it invokes one of two functions:

The most novel feature here is the protocol between client and server: the client starts the conversation by shipping a filename string up to the server, terminated with an end-of-line character, and including the file's directory path in the server. At the server, a spawned thread extracts the requested file's name by reading the client socket, and opens and transfers the requested file back to the client, one chunk of bytes at a time.

10.5.1 Running the File Server and Clients

Since the server uses threads to process clients, we can test both client and server on the same Windows machine. First, let's start a server instance, and execute two client instances on the same machine while the server runs:

[server window, localhost]
C:\...\PP2E\Internet\Sockets>python getfile.py -mode server 
Server connected by ('127.0.0.1', 1089) at Thu Mar 16 11:54:21 2000
Server connected by ('127.0.0.1', 1090) at Thu Mar 16 11:54:37 2000

 [client window, localhost]
C:\...\Internet\Sockets>ls
class-server.py   echo.out.txt      testdir           thread-server.py
echo-client.py    fork-server.py    testecho.py
echo-server.py    getfile.py        testechowait.py

C:\...\Internet\Sockets>python getfile.py -file testdir\python15.lib -port 50001
Client got testdir\python15.lib at Thu Mar 16 11:54:21 2000

C:\...\Internet\Sockets>python getfile.py -file testdir\textfile
Client got testdir\textfile at Thu Mar 16 11:54:37 2000

Clients run in the directory where you want the downloaded file to appear -- the client instance code strips the server directory path when making the local file's name. Here the "download" simply copied the requested files up to the local parent directory (the DOS fc command compares file contents):

C:\...\Internet\Sockets>ls
class-server.py   echo.out.txt      python15.lib      testechowait.py
echo-client.py    fork-server.py    testdir           textfile
echo-server.py    getfile.py        testecho.py       thread-server.py

C:\...\Internet\Sockets>fc /B python1.lib testdir\python15.lib
Comparing files python15.lib and testdir\python15.lib
FC: no differences encountered

C:\...\Internet\Sockets>fc /B textfile testdir\textfile
Comparing files textfile and testdir\textfile
FC: no differences encountered

As usual, we can run server and clients on different machines as well. Here the script is being used to run a remote server on a Linux machine and a few clients on a local Windows PC (I added line breaks to some of the command lines to make them fit). Notice that client and server machine times are different now -- they are fetched from different machine's clocks and so may be arbitrarily skewed:

 [server telnet window: first message is the python15.lib request
 in client window1]
[lutz@starship lutz]$ python getfile.py -mode server 
Server connected by ('166.93.216.248', 1185) at Thu Mar 16 16:02:07 2000
Server connected by ('166.93.216.248', 1187) at Thu Mar 16 16:03:24 2000
Server connected by ('166.93.216.248', 1189) at Thu Mar 16 16:03:52 2000
Server connected by ('166.93.216.248', 1191) at Thu Mar 16 16:04:09 2000
Server connected by ('166.93.216.248', 1193) at Thu Mar 16 16:04:38 2000


 [client window 1: started first, runs in thread while other client
 requests are made in client window 2, and processed by other threads]
C:\...\Internet\Sockets>python getfile.py -mode client 
                             -host starship.python.net 
                             -port 50001 -file python15.lib 
Client got python15.lib at Thu Mar 16 14:07:37 2000

C:\...\Internet\Sockets>fc /B python15.lib testdir\python15.lib 
Comparing files python15.lib and testdir\python15.lib
FC: no differences encountered


 [client window 2: requests made while client window 1 request downloading]
C:\...\Internet\Sockets>python getfile.py 
                             -host starship.python.net -file textfile 
Client got textfile at Thu Mar 16 14:02:29 2000

C:\...\Internet\Sockets>python getfile.py 
                             -host starship.python.net -file textfile 
Client got textfile at Thu Mar 16 14:04:11 2000

C:\...\Internet\Sockets>python getfile.py 
                             -host starship.python.net -file textfile 
Client got textfile at Thu Mar 16 14:04:21 2000

C:\...\Internet\Sockets>python getfile.py 
                             -host starship.python.net -file index.html 
Client got index.html at Thu Mar 16 14:06:22 2000

C:\...\Internet\Sockets>fc textfile testdir\textfile 
Comparing files textfile and testdir\textfile
FC: no differences encountered

One subtle security point here: the server instance code is happy to send any server-side file whose pathname is sent from a client, as long as the server is run with a username that has read access to the requested file. If you care about keeping some of your server-side files private, you should add logic to suppress downloads of restricted files. I'll leave this as a suggested exercise here, but will implement such filename checks in the getfile download tool in Example 12-1.[8]

Making Sockets Look Like Files

For illustration purposes, getfile uses the socket object makefile method to wrap the socket in a file-like object. Once so wrapped, the socket can be read and written using normal file methods; getfile uses the file readline call to read the filename line sent by the client.

This isn't strictly required in this example -- we could have read this line with the socket recv call, too. In general, though, the makefile method comes in handy any time you need to pass a socket to an interface that expects a file.

For example, the pickle module's load and dump methods expect an object with a file-like interface (e.g., read and write methods), but don't require a physical file. Passing a TCP/IP socket wrapped with the makefile call to the pickler allows us to ship serialized Python objects over the Internet. See Chapter 16, for more details on object serialization interfaces.

More generally, any component that expects a file-like method protocol will gladly accept a socket wrapped with a socket object makefile call. Such interfaces will also accept strings wrapped with the built-in StringIO module, and any other sort of object that supports the same kinds of method calls as built-in file objects. As always in Python, we code to protocols -- object interfaces -- not to specific datatypes.

10.5.2 Adding a User-Interface Frontend

You might have noticed that we have been living in the realm of the command line for all of this chapter -- our socket clients and servers have been started from simple DOS or Linux shells. There is nothing stopping us from adding a nice point-and-click user interface to some of these scripts, though; GUI and network scripting are not mutually exclusive techniques. In fact, they can be arguably sexy when used together well.

For instance, it would be easy to implement a simple Tkinter GUI frontend to the client-side portion of the getfile script we just met. Such a tool, run on the client machine, may simply pop up a window with Entry widgets for typing the desired filename, server, and so on. Once download parameters have been input, the user interface could either import and call the getfile.client function with appropriate option arguments, or build and run the implied getfile.py command line using tools such as os.system, os.fork, thread, etc.

10.5.2.1 Using Frames and command lines

To help make this all more concrete, let's very quickly explore a few simple scripts that add a Tkinter frontend to the getfile client-side program. The first, in Example 10-11, creates a dialog for inputting server, port, and filename information, and simply constructs the corresponding getfile command line and runs it with os.system.

Example 10-11. PP2E\Internet\Sockets\getfilegui-1.py
##########################################################
# launch getfile script client from simple Tkinter GUI;
# could also or os.fork+exec, os.spawnv (see Launcher);
# windows: replace 'python' with 'start' if not on path; 
##########################################################

import sys, os
from Tkinter import *
from tkMessageBox import showinfo

def onReturnKey():
    cmdline = ('python getfile.py -mode client -file %s -port %s -host %s' %
                      (content['File'].get(),
                       content['Port'].get(), 
                       content['Server'].get()))
    os.system(cmdline)
    showinfo('getfilegui-1', 'Download complete')

box = Frame(Tk())
box.pack(expand=YES, fill=X)
lcol, rcol = Frame(box), Frame(box)
lcol.pack(side=LEFT)
rcol.pack(side=RIGHT, expand=Y, fill=X)

labels = ['Server', 'Port', 'File']
content = {}
for label in labels:
    Label(lcol, text=label).pack(side=TOP)
    entry = Entry(rcol)
    entry.pack(side=TOP, expand=YES, fill=X)
    content[label] = entry

box.master.title('getfilegui-1')
box.master.bind('<Return>', (lambda event: onReturnKey()))
mainloop()

When run, this script creates the input form shown in Figure 10-1. Pressing the Enter key (<Return>) runs a client-side instance of the getfile program; when the generated getfile command line is finished, we get the verification pop-up displayed in Figure 10-2.

Figure 10-1. getfilegui-1 in action

figs/ppy2_1001.gif

Figure 10-2. getfilegui-1 verification pop-up

figs/ppy2_1002.gif

10.5.2.2 Using grids and function calls

The first user-interface script (Example 10-11) uses the pack geometry manager and Frames to layout the input form, and runs the getfile client as a stand- alone program. It's just as easy to use the grid manager for layout, and import and call the client-side logic function instead of running a program. The script in Example 10-12 shows how.

Example 10-12. PP2E\Internet\Sockets\getfilegui-2.py
###############################################################
# same, but with grids and import+call, not packs and cmdline;
# direct function calls are usually faster than running files;
###############################################################

import getfile
from Tkinter import *
from tkMessageBox import showinfo

def onSubmit():
    getfile.client(content['Server'].get(),
                   int(content['Port'].get()), 
                   content['File'].get())
    showinfo('getfilegui-2', 'Download complete')

box    = Tk()
labels = ['Server', 'Port', 'File']
rownum  = 0
content = {}
for label in labels:
    Label(box, text=label).grid(col=0, row=rownum)
    entry = Entry(box)
    entry.grid(col=1, row=rownum, sticky=E+W)
    content[label] = entry
    rownum = rownum + 1

box.columnconfigure(0, weight=0)   # make expandable
box.columnconfigure(1, weight=1)
Button(text='Submit', command=onSubmit).grid(row=rownum, col=0, columnspan=2)

box.title('getfilegui-2')
box.bind('<Return>', (lambda event: onSubmit()))
mainloop()

This version makes a similar window (Figure 10-3), but adds a button at the bottom that does the same thing as an Enter key press -- it runs the getfile client procedure. Generally speaking, importing and calling functions (as done here) is faster than running command lines, especially if done more than once. The getfile script is set up to work either way -- as program or function library.

Figure 10-3. getfilegui-2 in action

figs/ppy2_1003.gif

10.5.2.3 Using a reusable form-layout class

If you're like me, though, writing all the GUI form layout code in those two scripts can seem a bit tedious, whether you use packing or grids. In fact, it became so tedious to me that I decided to write a general-purpose form-layout class, shown in Example 10-13, that handles most of the GUI layout grunt work.

Example 10-13. PP2E\Internet\Sockets\form.py
# a reusable form class, used by getfilegui (and others)

from Tkinter import *
entrysize = 40

class Form:                                           # add non-modal form box
    def __init__(self, labels, parent=None):          # pass field labels list
        box = Frame(parent)
        box.pack(expand=YES, fill=X)
        rows = Frame(box, bd=2, relief=GROOVE)        # box has rows, button
        lcol = Frame(rows)                            # rows has lcol, rcol
        rcol = Frame(rows)                            # button or return key,
        rows.pack(side=TOP, expand=Y, fill=X)         # runs onSubmit method
        lcol.pack(side=LEFT)
        rcol.pack(side=RIGHT, expand=Y, fill=X)
        self.content = {}
        for label in labels:
            Label(lcol, text=label).pack(side=TOP)
            entry = Entry(rcol, width=entrysize)
            entry.pack(side=TOP, expand=YES, fill=X)
            self.content[label] = entry
        Button(box, text='Cancel', command=self.onCancel).pack(side=RIGHT)
        Button(box, text='Submit', command=self.onSubmit).pack(side=RIGHT)
        box.master.bind('<Return>', (lambda event, self=self: self.onSubmit()))

    def onSubmit(self):                                      # override this
        for key in self.content.keys():                      # user inputs in 
            print key, '\t=>\t', self.content[key].get()     # self.content[k]

    def onCancel(self):                                      # override if need
        Tk().quit()                                          # default is exit

class DynamicForm(Form):
    def __init__(self, labels=None):
        import string
        labels = string.split(raw_input('Enter field names: '))
        Form.__init__(self, labels)
    def onSubmit(self):
        print 'Field values...'
        Form.onSubmit(self)           
        self.onCancel()              
        
if __name__ == '__main__':
    import sys
    if len(sys.argv) == 1:
        Form(['Name', 'Age', 'Job'])     # precoded fields, stay after submit
    else:
        DynamicForm()                    # input fields, go away after submit
    mainloop()    

Running this module standalone triggers its self-test code at the bottom. Without arguments (and when double-clicked in a Windows file explorer), the self-test generates a form with canned input fields captured in Figure 10-4, and displays the fields' values on Enter key presses or Submit button clicks:

C:\...\PP2E\Internet\Sockets>python form.py
Job     =>      Educator, Entertainer
Age     =>      38
Name    =>      Bob
Figure 10-4. Form test, canned fields

figs/ppy2_1004.gif

With a command-line argument, the form class module's self-test code prompts for an arbitrary set of field names for the form; fields can be constructed as dynamically as we like. Figure 10-5 shows the input form constructed in response to the following console interaction. Field names could be accepted on the command line, too, but raw_input works just as well for simple tests like this. In this mode, the GUI goes away after the first submit, because DynamicForm.onSubmit says so:

C:\...\PP2E\Internet\Sockets>python form.py -
Enter field names: Name Email Web Locale
Field values...
Email   =>      lutz@rmi.net
Locale  =>      Colorado
Web     =>      http://rmi.net/~lutz
Name    =>      mel
Figure 10-5. Form test, dynamic fields

figs/ppy2_1005.gif

And last but not least, Example 10-14 shows the getfile user interface again, this time constructed with the reusable form layout class. We need to fill in only the form labels list, and provide an onSubmit callback method of our own. All of the work needed to construct the form comes "for free," from the imported and widely reusable Form superclass.

Example 10-14. PP2E\Internet\Sockets\getfilegui.py
#################################################################
# launch getfile client with a reusable gui form class;
# os.chdir to target local dir if input (getfile stores in cwd);
# to do: use threads, show download status and getfile prints;
#################################################################

from form import Form
from Tkinter import Tk, mainloop
from tkMessageBox import showinfo
import getfile, os

class GetfileForm(Form):
    def __init__(self, oneshot=0):
        root = Tk()
        root.title('getfilegui')
        labels = ['Server Name', 'Port Number', 'File Name', 'Local Dir?']
        Form.__init__(self, labels, root)
        self.oneshot = oneshot
    def onSubmit(self):
        Form.onSubmit(self)
        localdir   = self.content['Local Dir?'].get()
        portnumber = self.content['Port Number'].get()
        servername = self.content['Server Name'].get()
        filename   = self.content['File Name'].get()
        if localdir:
            os.chdir(localdir)
        portnumber = int(portnumber)
        getfile.client(servername, portnumber, filename)
        showinfo('getfilegui', 'Download complete')
        if self.oneshot: Tk().quit()  # else stay in last localdir

if __name__ == '__main__':
    GetfileForm()
    mainloop()    

The form layout class imported here can be used by any program that needs to input form-like data; when used in this script, we get a user-interface like that shown in Figure 10-6 under Windows (and similar on other platforms).

Figure 10-6. getfilegui in action

figs/ppy2_1006.gif

Pressing this form's Submit button or the Enter key makes the getfilegui script call the imported getfile.client client-side function as before. This time, though, we also first change to the local directory typed into the form, so that the fetched file is stored there (getfile stores in the current working directory, whatever that may be when it is called). As usual, we can use this interface to connect to servers running locally on the same machine, or remotely. Here is some of the interaction we get for both modes:

 [talking to a local server]
C:\...\PP2E\Internet\Sockets>python getfilegui.py 
Port Number     =>      50001
Local Dir?      =>      temp
Server Name     =>      localhost
File Name       =>      testdir\python15.lib
Client got testdir\python15.lib at Tue Aug 15 22:32:34 2000

 [talking to a remote server]
[lutz@starship lutz]$ /usr/bin/python getfile.py -mode server -port 51234 
Server connected by ('38.28.130.229', 1111) at Tue Aug 15 21:48:13 2000

C:\...\PP2E\Internet\Sockets>python getfilegui.py 
Port Number     =>      51234
Local Dir?      =>      temp
Server Name     =>      starship.python.net
File Name       =>      public_html/index.html
Client got public_html/index.html at Tue Aug 15 22:42:06 2000

One caveat worth pointing out here: the GUI is essentially dead while the download is in progress (even screen redraws aren't handled -- try covering and uncovering the window and you'll see what I mean). We could make this better by running the download in a thread, but since we'll see how in the next chapter, you should consider this problem a preview.

In closing, a few final notes. First of all, I should point out that the scripts in this chapter use Tkinter tricks we've seen before and won't go into here in the interest of space; be sure to see the GUI chapters in this book for implementation hints.

Keep in mind, too, that these interfaces all just add a GUI on top of the existing script to reuse its code; any command-line tool can be easily GUI-ified in this way to make them more appealing and user-friendly. In the next chapter, for example, we'll meet a more useful client-side Tkinter user interface for reading and sending email over sockets (PyMailGui), which largely just adds a GUI to mail-processing tools. Generally speaking, GUIs can often be added as almost an afterthought to a program. Although the degree of user-interface and core logic separation can vary per program, keeping the two distinct makes it easier to focus on each.

And finally, now that I've shown you how to build user interfaces on top of this chapter's getfile, I should also say that they aren't really as useful as they might seem. In particular, getfile clients can talk only to machines that are running a getfile server. In the next chapter, we'll discover another way to download files -- FTP -- which also runs on sockets, but provides a higher-level interface, and is available as a standard service on many machines on the Net. We don't generally need to start up a custom server to transfer files over FTP, the way we do with getfile. In fact, the user-interface scripts in this chapter could be easily changed to fetch the desired file with Python's FTP tools, instead of the getfile module. But rather than spilling all the beans here, I'll just say "read on."

Using Serial Ports on Windows

Sockets, the main subject of this chapter, are the programmer's interface to network connections in Python scripts. As we've seen, they let us write scripts that converse with computers arbitrarily located on a network, and they form the backbone of the Internet and the Web.

If you're looking for a more low-level way to communicate with devices in general, though, you may also be interested in the topic of Python's serial port interfaces. This isn't quite related to Internet scripting and applies only on Windows machines, but it's similar enough in spirit and is discussed often enough on the Net to merit a quick look here.

Serial ports are known as "COM" ports on Windows (not to be confused with the COM object model), and are identified as "COM1", "COM2", and so on. By using interfaces to these ports, scripts may engage in low-level communication with things like mice, modems, and a wide variety of serial devices. Serial port interfaces are also used to communicate with devices connected over infrared ports (e.g., hand-held computers and remote modems). There are often other higher-level ways to access such devices (e.g., the PyRite package for ceasing Palm Pilot databases, or RAS for using modems), but serial port interfaces let scripts tap into raw data streams and implement device protocols of their own.

There are at least three ways to send and receive data over serial ports in Python scripts -- a public domain C extension package known as Serial, the proprietary MSComm COM server object interface published by Microsoft, and the low-level CreateFile file API call exported by the Python Windows extensions package, available via links at http://www.python.org.

Unfortunately, there's no time to cover any of these in detail in this text. For more information, the O'Reilly book Python Programming on Win32 includes an entire section dedicated to serial port communication topics. Also be sure to use the search tools at Python's web site for up-to-date details on this front.

[1]  Some books also use the term protocol to refer to lower-level transport schemes such as TCP. In this book, we use protocol to refer to higher-level structures built on top of sockets; see a networking text if you are curious about what happens at lower levels.

[2]  Since Python is an open source system, you can read the source code of the ftplib module if you are curious about how the underlying protocol actually works. See file ftplib.py in the standard source library directory in your machine. Its code is complex (since it must format messages and manage two sockets), but with the other standard Internet protocol modules, it is a good example of low-level socket programming.

[3]  The FTP command is standard on Windows machines and most others. On Windows, simply type it in a DOS console box to connect to an FTP server (or start your favorite FTP program); on Linux, type the FTP command in an xterm window. You'll need to supply your account name and password to connect to a non-anonymous FTP site. For anonymous FTP, use "anonymous" for the username and your email address for the password (anonymous FTP sites are generally limited).

[4]  Telnet is a standard command on Windows and Linux machines, too. On Windows, type it at a DOS console prompt or in the Start/Run dialog box (it can also be started via a clickable icon). Telnet usually runs in a window of its own.

[5]  You might be interested to know that the last part of this example, talking to port 80, is exactly what your web browser does as you surf the Net: followed links direct it to download web pages over this port. In fact, this lowly port is the primary basis of the Web. In Chapter 12, we will meet an entire application environment based upon sending data over port 80 -- CGI server-side scripting.

[6]  Incidentally, Python also comes with library tools that allow you to implement a full-blown HTTP (web) server that knows how to run server-side CGI scripts, in a few lines of Python code. We'll explore those tools in Chapter 15.

[7]  And the trace output on the server will probably look a bit different every time it runs. Clients and new connections are interleaved almost at random due to timing differences on the host machines.

[8]  We'll see three more getfile programs before we leave Internet scripting. The next chapter's getfile.py fetches a file with the higher-level FTP interface rather than using raw socket calls, and its http-getfile scripts fetch files over the HTTP protocol. Example 12-1 presents a getfile.cgi script that transfers file contents over the HTTP port in response to a request made in a web browser client (files are sent as the output of a CGI script). All four of the download schemes presented in this text ultimately use sockets, but only the version here makes that use explicit.

CONTENTS